Semi-Supervised Learning of Monocular 3D Hand Pose Estimation from Multi-View Images

Markus Müller; Georg Poier; Horst Possegger; Horst Bischof

doi:10.1109/ICIP42928.2021.9506760

Semi-Supervised Learning of Monocular 3D Hand Pose Estimation from Multi-View Images

Markus Müller, Georg Poier, Horst Possegger, Horst Bischof

Institut für Maschinelles Sehen und Darstellen (7100)

Publikation: Beitrag in Buch/Bericht/Konferenzband › Beitrag in einem Konferenzband › Begutachtung

Abstract

Most modern hand pose estimation methods rely on Convolutional Neural Networks (CNNs), which typically require a large training dataset to perform well. Exploiting unlabeled data provides a way to reduce the required amount of annotated data. We propose to take advantage of a geometry-aware representation of the human hand, which we learn from multiview images without annotations. The objective for learning this representation is simply based on learning to predict a different view. Our results show that using this objective yields clearly superior pose estimation results compared to directly mapping an input image to the 3Djoint locations of the hand if the amount of 3D annotations is limited. We further show the effect of the objective for either case, using the objective for pre-learning as well as to simultaneously learn to predict novel views and to estimate the 3D pose of the hand.

Originalsprache	englisch
Titel	IEEE International Conference on Image Processing (ICIP)
Seiten	1104-1108
DOIs	https://doi.org/10.1109/ICIP42928.2021.9506760
Publikationsstatus	Veröffentlicht - 2021
Veranstaltung	2021 IEEE International Conference on Image Processing: IEEE ICIP 2021 - Virtuell, USA / Vereinigte Staaten Dauer: 19 Sept. 2021 → 22 Sept. 2021 https://2021.ieeeicip.org/

Konferenz

Konferenz	2021 IEEE International Conference on Image Processing
Kurztitel	ICIP
Land/Gebiet	USA / Vereinigte Staaten
Ort	Virtuell
Zeitraum	19/09/21 → 22/09/21
Internetadresse	https://2021.ieeeicip.org/

Zugriff auf Dokument

10.1109/ICIP42928.2021.9506760

Dieses zitieren

@inproceedings{ff5514141b51483cb2919b88b2d6dea8,

title = "Semi-Supervised Learning of Monocular 3D Hand Pose Estimation from Multi-View Images",

abstract = "Most modern hand pose estimation methods rely on Convolutional Neural Networks (CNNs), which typically require a large training dataset to perform well. Exploiting unlabeled data provides a way to reduce the required amount of annotated data. We propose to take advantage of a geometry-aware representation of the human hand, which we learn from multiview images without annotations. The objective for learning this representation is simply based on learning to predict a different view. Our results show that using this objective yields clearly superior pose estimation results compared to directly mapping an input image to the 3Djoint locations of the hand if the amount of 3D annotations is limited. We further show the effect of the objective for either case, using the objective for pre-learning as well as to simultaneously learn to predict novel views and to estimate the 3D pose of the hand.",

author = "Markus M{\"u}ller and Georg Poier and Horst Possegger and Horst Bischof",

year = "2021",

doi = "10.1109/ICIP42928.2021.9506760",

language = "English",

pages = "1104--1108",

booktitle = "IEEE International Conference on Image Processing (ICIP)",

note = "2021 IEEE International Conference on Image Processing : IEEE ICIP 2021, ICIP ; Conference date: 19-09-2021 Through 22-09-2021",

url = "https://2021.ieeeicip.org/",

}

TY - GEN

T1 - Semi-Supervised Learning of Monocular 3D Hand Pose Estimation from Multi-View Images

AU - Müller, Markus

AU - Poier, Georg

AU - Possegger, Horst

AU - Bischof, Horst

PY - 2021

Y1 - 2021

N2 - Most modern hand pose estimation methods rely on Convolutional Neural Networks (CNNs), which typically require a large training dataset to perform well. Exploiting unlabeled data provides a way to reduce the required amount of annotated data. We propose to take advantage of a geometry-aware representation of the human hand, which we learn from multiview images without annotations. The objective for learning this representation is simply based on learning to predict a different view. Our results show that using this objective yields clearly superior pose estimation results compared to directly mapping an input image to the 3Djoint locations of the hand if the amount of 3D annotations is limited. We further show the effect of the objective for either case, using the objective for pre-learning as well as to simultaneously learn to predict novel views and to estimate the 3D pose of the hand.

AB - Most modern hand pose estimation methods rely on Convolutional Neural Networks (CNNs), which typically require a large training dataset to perform well. Exploiting unlabeled data provides a way to reduce the required amount of annotated data. We propose to take advantage of a geometry-aware representation of the human hand, which we learn from multiview images without annotations. The objective for learning this representation is simply based on learning to predict a different view. Our results show that using this objective yields clearly superior pose estimation results compared to directly mapping an input image to the 3Djoint locations of the hand if the amount of 3D annotations is limited. We further show the effect of the objective for either case, using the objective for pre-learning as well as to simultaneously learn to predict novel views and to estimate the 3D pose of the hand.

U2 - 10.1109/ICIP42928.2021.9506760

DO - 10.1109/ICIP42928.2021.9506760

M3 - Conference paper

SP - 1104

EP - 1108

BT - IEEE International Conference on Image Processing (ICIP)

T2 - 2021 IEEE International Conference on Image Processing

Y2 - 19 September 2021 through 22 September 2021

ER -

Semi-Supervised Learning of Monocular 3D Hand Pose Estimation from Multi-View Images

Abstract

Konferenz

Zugriff auf Dokument

Fingerprint

Dieses zitieren