Semi-Supervised Learning of Monocular 3D Hand Pose Estimation from Multi-View Images

Markus Müller; Georg Poier; Horst Possegger; Horst Bischof

doi:10.1109/ICIP42928.2021.9506760

Semi-Supervised Learning of Monocular 3D Hand Pose Estimation from Multi-View Images

Markus Müller, Georg Poier, Horst Possegger, Horst Bischof

Institute of Computer Graphics and Vision (7100)

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

Abstract

Most modern hand pose estimation methods rely on Convolutional Neural Networks (CNNs), which typically require a large training dataset to perform well. Exploiting unlabeled data provides a way to reduce the required amount of annotated data. We propose to take advantage of a geometry-aware representation of the human hand, which we learn from multiview images without annotations. The objective for learning this representation is simply based on learning to predict a different view. Our results show that using this objective yields clearly superior pose estimation results compared to directly mapping an input image to the 3Djoint locations of the hand if the amount of 3D annotations is limited. We further show the effect of the objective for either case, using the objective for pre-learning as well as to simultaneously learn to predict novel views and to estimate the 3D pose of the hand.

Original language	English
Title of host publication	IEEE International Conference on Image Processing (ICIP)
Pages	1104-1108
DOIs	https://doi.org/10.1109/ICIP42928.2021.9506760
Publication status	Published - 2021
Event	2021 IEEE International Conference on Image Processing: IEEE ICIP 2021 - Virtuell, United States Duration: 19 Sept 2021 → 22 Sept 2021 https://2021.ieeeicip.org/

Conference

Conference	2021 IEEE International Conference on Image Processing
Abbreviated title	ICIP
Country/Territory	United States
City	Virtuell
Period	19/09/21 → 22/09/21
Internet address	https://2021.ieeeicip.org/

Access to Document

10.1109/ICIP42928.2021.9506760

Cite this

@inproceedings{ff5514141b51483cb2919b88b2d6dea8,

title = "Semi-Supervised Learning of Monocular 3D Hand Pose Estimation from Multi-View Images",

abstract = "Most modern hand pose estimation methods rely on Convolutional Neural Networks (CNNs), which typically require a large training dataset to perform well. Exploiting unlabeled data provides a way to reduce the required amount of annotated data. We propose to take advantage of a geometry-aware representation of the human hand, which we learn from multiview images without annotations. The objective for learning this representation is simply based on learning to predict a different view. Our results show that using this objective yields clearly superior pose estimation results compared to directly mapping an input image to the 3Djoint locations of the hand if the amount of 3D annotations is limited. We further show the effect of the objective for either case, using the objective for pre-learning as well as to simultaneously learn to predict novel views and to estimate the 3D pose of the hand.",

author = "Markus M{\"u}ller and Georg Poier and Horst Possegger and Horst Bischof",

year = "2021",

doi = "10.1109/ICIP42928.2021.9506760",

language = "English",

pages = "1104--1108",

booktitle = "IEEE International Conference on Image Processing (ICIP)",

note = "2021 IEEE International Conference on Image Processing : IEEE ICIP 2021, ICIP ; Conference date: 19-09-2021 Through 22-09-2021",

url = "https://2021.ieeeicip.org/",

}

TY - GEN

T1 - Semi-Supervised Learning of Monocular 3D Hand Pose Estimation from Multi-View Images

AU - Müller, Markus

AU - Poier, Georg

AU - Possegger, Horst

AU - Bischof, Horst

PY - 2021

Y1 - 2021

N2 - Most modern hand pose estimation methods rely on Convolutional Neural Networks (CNNs), which typically require a large training dataset to perform well. Exploiting unlabeled data provides a way to reduce the required amount of annotated data. We propose to take advantage of a geometry-aware representation of the human hand, which we learn from multiview images without annotations. The objective for learning this representation is simply based on learning to predict a different view. Our results show that using this objective yields clearly superior pose estimation results compared to directly mapping an input image to the 3Djoint locations of the hand if the amount of 3D annotations is limited. We further show the effect of the objective for either case, using the objective for pre-learning as well as to simultaneously learn to predict novel views and to estimate the 3D pose of the hand.

AB - Most modern hand pose estimation methods rely on Convolutional Neural Networks (CNNs), which typically require a large training dataset to perform well. Exploiting unlabeled data provides a way to reduce the required amount of annotated data. We propose to take advantage of a geometry-aware representation of the human hand, which we learn from multiview images without annotations. The objective for learning this representation is simply based on learning to predict a different view. Our results show that using this objective yields clearly superior pose estimation results compared to directly mapping an input image to the 3Djoint locations of the hand if the amount of 3D annotations is limited. We further show the effect of the objective for either case, using the objective for pre-learning as well as to simultaneously learn to predict novel views and to estimate the 3D pose of the hand.

U2 - 10.1109/ICIP42928.2021.9506760

DO - 10.1109/ICIP42928.2021.9506760

M3 - Conference paper

SP - 1104

EP - 1108

BT - IEEE International Conference on Image Processing (ICIP)

T2 - 2021 IEEE International Conference on Image Processing

Y2 - 19 September 2021 through 22 September 2021

ER -

Semi-Supervised Learning of Monocular 3D Hand Pose Estimation from Multi-View Images

Abstract

Conference

Access to Document

Fingerprint

Cite this