TY - JOUR
T1 - Look at Me! Correcting eye gaze in live video communication
AU - Hsu, Chih Fan
AU - Wang, Yu-Shuen
AU - Lei, Chin Laung
AU - Chen, Kuan Ta
PY - 2019/5
Y1 - 2019/5
N2 - Although live video communication is widely used, it is generally less engaging than face-to-face communication because of limitations on social, emotional, and haptic feedback. Missing eye contact is one such problem caused by the physical deviation between the screen and camera on a device. Manipulating video frames to correct eye gaze is a solution to this problem. In this article, we introduce a system to rotate the eyeball of a local participant before the video frame is sent to the remote side. It adopts a warping-based convolutional neural network to relocate pixels in eye regions. To improve visual quality, we minimize the L2 distance between the ground truths and warped eyes. We also present several newly designed loss functions to help network training. These new loss functions are designed to preserve the shape of eye structures and minimize color changes around the periphery of eye regions. To evaluate the presented network and loss functions, we objectively and subjectively compared results generated by our system and the state-of-the-art, DeepWarp, in relation to two datasets. The experimental results demonstrated the effectiveness of our system. In addition, we showed that our system can perform eye-gaze correction in real time on a consumer-level laptop. Because of the quality and efficiency of the system, gaze correction by postprocessing through this system is a feasible solution to the problem of missing eye contact in video communication.
AB - Although live video communication is widely used, it is generally less engaging than face-to-face communication because of limitations on social, emotional, and haptic feedback. Missing eye contact is one such problem caused by the physical deviation between the screen and camera on a device. Manipulating video frames to correct eye gaze is a solution to this problem. In this article, we introduce a system to rotate the eyeball of a local participant before the video frame is sent to the remote side. It adopts a warping-based convolutional neural network to relocate pixels in eye regions. To improve visual quality, we minimize the L2 distance between the ground truths and warped eyes. We also present several newly designed loss functions to help network training. These new loss functions are designed to preserve the shape of eye structures and minimize color changes around the periphery of eye regions. To evaluate the presented network and loss functions, we objectively and subjectively compared results generated by our system and the state-of-the-art, DeepWarp, in relation to two datasets. The experimental results demonstrated the effectiveness of our system. In addition, we showed that our system can perform eye-gaze correction in real time on a consumer-level laptop. Because of the quality and efficiency of the system, gaze correction by postprocessing through this system is a feasible solution to the problem of missing eye contact in video communication.
KW - Convolutional neural network
KW - Eye contact
KW - Gaze correction
KW - Image processing
KW - Live video communication
UR - http://www.scopus.com/inward/record.url?scp=85067239938&partnerID=8YFLogxK
U2 - 10.1145/3311784
DO - 10.1145/3311784
M3 - Article
AN - SCOPUS:85067239938
SN - 1551-6857
VL - 15
SP - 1
EP - 21
JO - ACM Transactions on Multimedia Computing, Communications and Applications
JF - ACM Transactions on Multimedia Computing, Communications and Applications
IS - 2
M1 - a38
ER -