Audio-visual speech enhancement using deep neural networks

Jen Cheng Hou, Syu Siang Wang, Ying Hui Lai, Jen Chun Lin, Yu Tsao, Hsiu Wen Chang, Hsin Min Wang

研究成果: Conference contribution同行評審

20 引文 斯高帕斯(Scopus)

摘要

This paper proposes a novel framework that integrates audio and visual information for speech enhancement. Most speech enhancement approaches consider audio features only to design filters or transfer functions to convert noisy speech signals to clean ones. Visual data, which provide useful complementary information to audio data, have been integrated with audio data in many speech-related approaches to attain more effective speech processing performance. This paper presents our investigation into the use of the visual features of the motion of lips as additional visual information to improve the speech enhancement capability of deep neural network (DNN) speech enhancement performance. The experimental results show that the performance of DNN with audio-visual inputs exceeds that of DNN with audio inputs only in four standardized objective evaluations, thereby confirming the effectiveness of the inclusion of visual information into an audio-only speech enhancement framework.

原文English
主出版物標題2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016
發行者Institute of Electrical and Electronics Engineers Inc.
ISBN(電子)9789881476821
DOIs
出版狀態Published - 17 1月 2017
事件2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016 - Jeju, 韓國
持續時間: 13 12月 201616 12月 2016

出版系列

名字2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016

Conference

Conference2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016
國家/地區韓國
城市Jeju
期間13/12/1616/12/16

指紋

深入研究「Audio-visual speech enhancement using deep neural networks」主題。共同形成了獨特的指紋。

引用此