Transformer-based spatial-Temporal feature lifting for 3D hand mesh reconstruction

Meng Xue Lin, Wen Jiin Tsai

研究成果: Conference contribution同行評審

摘要

This paper presents a novel model for reconstructing hand meshes in video sequences. The model extends the MobRecon [1] pipeline and incorporates a variant of the Transformer architecture which effectively models both spatial and temporal relationships using distinct positional encodings. The Transformer encoder enhances the feature representation by modeling joint relationships and learning hidden depth information. Leveraging temporal information from consecutive frames, the Transformer decoder further enhances the feature representation for the mesh decoder's final prediction. Additionally, we incorporate techniques such as Twice-LN, confidence-based attention, scaling in place of Softmax, and learnable encodings to improve the feature representation. Experimental results demonstrate the superiority of the proposed method over existing approaches.

原文English
主出版物標題2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023
發行者Institute of Electrical and Electronics Engineers Inc.
ISBN(電子)9798350359855
DOIs
出版狀態Published - 2023
事件2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023 - Jeju, Korea, Republic of
持續時間: 4 12月 20237 12月 2023

出版系列

名字2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023

Conference

Conference2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023
國家/地區Korea, Republic of
城市Jeju
期間4/12/237/12/23

指紋

深入研究「Transformer-based spatial-Temporal feature lifting for 3D hand mesh reconstruction」主題。共同形成了獨特的指紋。

引用此