TY - JOUR
T1 - 6DFLRNet
T2 - 6D rotation representation for head pose estimation based on facial landmarks and regression
AU - Zhao, Na
AU - Ma, Yaofei
AU - Li, Xiaopeng
AU - Lee, Shin Jye
AU - Wang, Jian
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
PY - 2024/8
Y1 - 2024/8
N2 - Head pose estimation methods can be generally classified into two categories: model-based and appearance-based methods. The model-based approach relies on facial landmarks for three-dimensional reconstruction, aiming to achieve high-precision results. However, this method is heavily dependent on the accuracy of these landmarks. The appearance-based approach utilizes images as input and employs feature extraction and calculations to generate outcomes. While the appearance-based method boasts greater robustness, its accuracy falls short of the former. In this paper, a new and effective hybrid method is proposed. This hybrid approach combines the strengths of both methods. Unlike the conventional model-based methods, the proposed method regards the facial landmarks in 2D images as a sequence of neural network inputs and then obtains the head pose estimation results for users by neural network regression. The proposed method solves the fuzzy rotation labeling problem by using a rotation matrix representation, introducing a 6D rotation matrix representation as an intermediate state of the rotation matrix to achieve effective direct regression. Introducing face processing enhances the robustness of the model in cross-dataset scenarios. The proposed method achieves remarkable results based on imprecise face recognition and a simplistic model. The proposed method can be divided into three parts. First, the proposed method applies face processing on the input image; second, the method detects facial landmarks; and third, it converts these facial landmarks into sequences and obtains the 6D rotation representation of the head pose by regression. Extensive experiments on the publicly available BIWI, PRIMA, and DrivFace datasets show that this method is functional and performs better than other state-of-the-art methods. Compared to other methods, this approach demonstrates an average performance improvement of at least 10% across the dataset.
AB - Head pose estimation methods can be generally classified into two categories: model-based and appearance-based methods. The model-based approach relies on facial landmarks for three-dimensional reconstruction, aiming to achieve high-precision results. However, this method is heavily dependent on the accuracy of these landmarks. The appearance-based approach utilizes images as input and employs feature extraction and calculations to generate outcomes. While the appearance-based method boasts greater robustness, its accuracy falls short of the former. In this paper, a new and effective hybrid method is proposed. This hybrid approach combines the strengths of both methods. Unlike the conventional model-based methods, the proposed method regards the facial landmarks in 2D images as a sequence of neural network inputs and then obtains the head pose estimation results for users by neural network regression. The proposed method solves the fuzzy rotation labeling problem by using a rotation matrix representation, introducing a 6D rotation matrix representation as an intermediate state of the rotation matrix to achieve effective direct regression. Introducing face processing enhances the robustness of the model in cross-dataset scenarios. The proposed method achieves remarkable results based on imprecise face recognition and a simplistic model. The proposed method can be divided into three parts. First, the proposed method applies face processing on the input image; second, the method detects facial landmarks; and third, it converts these facial landmarks into sequences and obtains the 6D rotation representation of the head pose by regression. Extensive experiments on the publicly available BIWI, PRIMA, and DrivFace datasets show that this method is functional and performs better than other state-of-the-art methods. Compared to other methods, this approach demonstrates an average performance improvement of at least 10% across the dataset.
KW - 6D rotation representation
KW - Convolutional neural network
KW - Head pose estimation
KW - Multilayer perceptron
KW - Rotation matrix
UR - http://www.scopus.com/inward/record.url?scp=85183116114&partnerID=8YFLogxK
U2 - 10.1007/s11042-023-17731-6
DO - 10.1007/s11042-023-17731-6
M3 - Article
AN - SCOPUS:85183116114
SN - 1380-7501
VL - 83
SP - 68605
EP - 68624
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 26
ER -