Temporal-aware self-supervised learning for 3D hand pose and mesh estimation in videos

Liangjian Chen, Shih Yao Lin, Yusheng Xie, Yen-Yu Lin, Xiaohui Xie

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

19 Scopus citations

Abstract

Estimating 3D hand pose directly from RGB images is challenging but has gained steady progress recently by training deep models with annotated 3D poses. However annotating 3D poses is difficult and as such only a few 3D hand pose datasets are available, all with limited sample sizes. In this study, we propose a new framework of training 3D pose estimation models from RGB images without using explicit 3D annotations, i.e., trained with only 2D information. Our framework is motivated by two observations: 1) Videos provide richer information for estimating 3D poses as opposed to static images; 2) Estimated 3D poses ought to be consistent whether the videos are viewed in the forward order or reverse order. We leverage these two observations to develop a self-supervised learning model called temporal-aware self-supervised network (TASSN). By enforcing temporal consistency constraints, TASSN learns 3D hand poses and meshes from videos with only 2D keypoint position annotations. Experiments show that our model achieves surprisingly good results, with 3D estimation accuracy on par with the state-of-the-art models trained with 3D annotations, highlighting the benefit of the temporal consistency in constraining 3D prediction models.

Original languageAmerican English
Title of host publicationProceedings - 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1049-1058
Number of pages10
ISBN (Electronic)9780738142661
DOIs
StatePublished - Jan 2021
Event2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021 - Virtual, Online, United States
Duration: 5 Jan 20219 Jan 2021

Publication series

NameProceedings - 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021

Conference

Conference2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021
Country/TerritoryUnited States
CityVirtual, Online
Period5/01/219/01/21

Fingerprint

Dive into the research topics of 'Temporal-aware self-supervised learning for 3D hand pose and mesh estimation in videos'. Together they form a unique fingerprint.

Cite this