TY - GEN
T1 - Talking Head Generation Based on 3D Morphable Facial Model
AU - Shen, Hsin Yu
AU - Tsai, Wen Jiin
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - This paper presents a framework for one-shot talking-head video generation which takes a single person image and audio clips as input and synthesizes photo-realistic videos with natural head-poses and lip motion synced to the driving audio. The main idea behind this framework is to use 3D Morphable Model (3DMM) parameters as intermediate representation in generating the videos. We design an Expression Predictor and a Head Pose Predictor to predict facial expression and head-pose parameters from audio, respectively, and adopt a 3DMM model to extract identity and texture parameters from the reference image. With these parameters, facial images are rendered as an auxiliary to guide video generation. Compared to widely used facial landmarks, 3DMM parameters are more powerful in representing facial details. Experimental results show that our method can generate realistic talking-head videos and outperform many state-of-the-art methods.
AB - This paper presents a framework for one-shot talking-head video generation which takes a single person image and audio clips as input and synthesizes photo-realistic videos with natural head-poses and lip motion synced to the driving audio. The main idea behind this framework is to use 3D Morphable Model (3DMM) parameters as intermediate representation in generating the videos. We design an Expression Predictor and a Head Pose Predictor to predict facial expression and head-pose parameters from audio, respectively, and adopt a 3DMM model to extract identity and texture parameters from the reference image. With these parameters, facial images are rendered as an auxiliary to guide video generation. Compared to widely used facial landmarks, 3DMM parameters are more powerful in representing facial details. Experimental results show that our method can generate realistic talking-head videos and outperform many state-of-the-art methods.
KW - 3DMM
KW - deep learning
KW - image-to-image translation
KW - self-attention
KW - talking-head generation
UR - http://www.scopus.com/inward/record.url?scp=85197676631&partnerID=8YFLogxK
U2 - 10.1109/PCS60826.2024.10566437
DO - 10.1109/PCS60826.2024.10566437
M3 - Conference contribution
AN - SCOPUS:85197676631
T3 - 2024 Picture Coding Symposium, PCS 2024 - Proceedings
BT - 2024 Picture Coding Symposium, PCS 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 Picture Coding Symposium, PCS 2024
Y2 - 12 June 2024 through 14 June 2024
ER -