TY - GEN
T1 - Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation under Hand-Object Interaction
AU - Armagan, Anil
AU - Garcia-Hernando, Guillermo
AU - Baek, Seungryul
AU - Hampali Shivakumar, Shreyas
AU - Rad, Mahdi
AU - Zhang, Zhaohui
AU - Xie, Shipeng
AU - Chen, MingXiu
AU - Zhang, Boshen
AU - Xiong, Fu
AU - Xiao, Yang
AU - Cao, Zhiguo
AU - Yuan, Junsong
AU - Ren, Pengfei
AU - Huang, Weiting
AU - Sun, Haifeng
AU - Hrúz, Marek
AU - Kanis, Jakub
AU - Krňoul, Zdeněk
AU - Wan, Qingfu
AU - Li, Shile
AU - Yang, Linlin
AU - Lee, Dongheui
AU - Yao, Angela
AU - Zhou, Weiguo
AU - Mei, Sijia
AU - Liu, Yunhui
AU - Spurr, Adrian
AU - Iqbal, Umar
AU - Molchanov, Pavlo
AU - Weinzaepfel, Philippe
AU - Brégier, Romain
AU - Rogez, Gregory
AU - Lepetit, Vincent
AU - Kim, Tae-Kyun
PY - 2020/8/23
Y1 - 2020/8/23
N2 - We study how well different types of approaches generalise in the task of 3D hand pose estimation under single hand scenarios and hand-object interaction. We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set. Unfortunately, since the space of hand poses is highly dimensional, it is inherently not feasible to cover the whole space densely, despite recent efforts in collecting large-scale training datasets. This sampling problem is even more severe when hands are interacting with objects and/or inputs are RGB rather than depth images, as RGB images also vary with lighting conditions and colors. To address these issues, we designed a public challenge (HANDS’19) to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set. More exactly, HANDS’19 is designed (a) to evaluate the influence of both depth and color modalities on 3D hand pose estimation, under the presence or absence of objects; (b) to assess the generalisation abilities w.r.t. four main axes: shapes, articulations, viewpoints, and objects; (c) to explore the use of a synthetic hand models to fill the gaps of current datasets. Through the challenge, the overall accuracy has dramatically improved over the baseline, especially on extrapolation tasks, from 27 mm to 13 mm mean joint error. Our analyses highlight the impacts of: Data pre-processing, ensemble approaches, the use of a parametric 3D hand model (MANO), and different HPE methods/backbones.
AB - We study how well different types of approaches generalise in the task of 3D hand pose estimation under single hand scenarios and hand-object interaction. We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set. Unfortunately, since the space of hand poses is highly dimensional, it is inherently not feasible to cover the whole space densely, despite recent efforts in collecting large-scale training datasets. This sampling problem is even more severe when hands are interacting with objects and/or inputs are RGB rather than depth images, as RGB images also vary with lighting conditions and colors. To address these issues, we designed a public challenge (HANDS’19) to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set. More exactly, HANDS’19 is designed (a) to evaluate the influence of both depth and color modalities on 3D hand pose estimation, under the presence or absence of objects; (b) to assess the generalisation abilities w.r.t. four main axes: shapes, articulations, viewpoints, and objects; (c) to explore the use of a synthetic hand models to fill the gaps of current datasets. Through the challenge, the overall accuracy has dramatically improved over the baseline, especially on extrapolation tasks, from 27 mm to 13 mm mean joint error. Our analyses highlight the impacts of: Data pre-processing, ensemble approaches, the use of a parametric 3D hand model (MANO), and different HPE methods/backbones.
UR - http://www.scopus.com/inward/record.url?scp=85097407772&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-58592-1_6
DO - 10.1007/978-3-030-58592-1_6
M3 - Conference paper
SN - 9783030585914
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 85
EP - 101
BT - Computer Vision – ECCV 2020 - 16th European Conference, Glasgow, 2020, Proceedings
A2 - Vedaldi, Andrea
A2 - Bischof, Horst
A2 - Brox, Thomas
A2 - Frahm, Jan-Michael
T2 - 16th European Conference on Computer Vision
Y2 - 23 August 2020 through 28 August 2020
ER -