TY - GEN
T1 - DGGAN
T2 - 2020 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2020
AU - Chen, Liangjian
AU - Lin, Shih Yao
AU - Xie, Yusheng
AU - Lin, Yen-Yu
AU - Fan, Wei
AU - Xie, Xiaohui
PY - 2020/3
Y1 - 2020/3
N2 - Estimating 3D hand poses from RGB images is essential to a wide range of potential applications, but is challenging owing to substantial ambiguity in the inference of depth information from RGB images. State-of-the-art estimators address this problem by regularizing 3D hand pose estimation models during training to enforce the consistency between the predicted 3D poses and the ground-truth depth maps. However, these estimators rely on both RGB images and the paired depth maps during training. In this study, we propose a conditional generative adversarial network (GAN) model, called Depth-image Guided GAN (DGGAN), to generate realistic depth maps conditioned on the input RGB image, and use the synthesized depth maps to regularize the 3D hand pose estimation model, therefore eliminating the need for ground-truth depth maps. Experimental results on multiple benchmark datasets show that the synthesized depth maps produced by DGGAN are quite effective in regularizing the pose estimation model, yielding new state-of-the-art results in estimation accuracy, notably reducing the mean 3D endpoint errors (EPE) by 4.7%, 16.5%, and 6.8% on the RHD, STB and MHP datasets, respectively.
AB - Estimating 3D hand poses from RGB images is essential to a wide range of potential applications, but is challenging owing to substantial ambiguity in the inference of depth information from RGB images. State-of-the-art estimators address this problem by regularizing 3D hand pose estimation models during training to enforce the consistency between the predicted 3D poses and the ground-truth depth maps. However, these estimators rely on both RGB images and the paired depth maps during training. In this study, we propose a conditional generative adversarial network (GAN) model, called Depth-image Guided GAN (DGGAN), to generate realistic depth maps conditioned on the input RGB image, and use the synthesized depth maps to regularize the 3D hand pose estimation model, therefore eliminating the need for ground-truth depth maps. Experimental results on multiple benchmark datasets show that the synthesized depth maps produced by DGGAN are quite effective in regularizing the pose estimation model, yielding new state-of-the-art results in estimation accuracy, notably reducing the mean 3D endpoint errors (EPE) by 4.7%, 16.5%, and 6.8% on the RHD, STB and MHP datasets, respectively.
UR - http://www.scopus.com/inward/record.url?scp=85085471039&partnerID=8YFLogxK
U2 - 10.1109/WACV45572.2020.9093380
DO - 10.1109/WACV45572.2020.9093380
M3 - Conference contribution
AN - SCOPUS:85085471039
T3 - Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020
SP - 400
EP - 408
BT - Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 1 March 2020 through 5 March 2020
ER -