Semi-Supervised Learning of Monocular 3D Hand Pose Estimation from Multi-View Images

Markus Müller, Georg Poier, Horst Possegger, Horst Bischof

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review


Most modern hand pose estimation methods rely on Convolutional Neural Networks (CNNs), which typically require a large training dataset to perform well. Exploiting unlabeled data provides a way to reduce the required amount of annotated data. We propose to take advantage of a geometry-aware representation of the human hand, which we learn from multiview images without annotations. The objective for learning this representation is simply based on learning to predict a different view. Our results show that using this objective yields clearly superior pose estimation results compared to directly mapping an input image to the 3Djoint locations of the hand if the amount of 3D annotations is limited. We further show the effect of the objective for either case, using the objective for pre-learning as well as to simultaneously learn to predict novel views and to estimate the 3D pose of the hand.
Original languageEnglish
Title of host publicationIEEE International Conference on Image Processing (ICIP)
Publication statusPublished - 2021
Event2021 IEEE International Conference on Image Processing: IEEE ICIP 2021 - Virtuell, United States
Duration: 19 Sept 202122 Sept 2021


Conference2021 IEEE International Conference on Image Processing
Abbreviated titleICIP
Country/TerritoryUnited States
Internet address


Dive into the research topics of 'Semi-Supervised Learning of Monocular 3D Hand Pose Estimation from Multi-View Images'. Together they form a unique fingerprint.

Cite this