An Audio-Visual Speech Enhancement System Based on 3D Image Features: An Application in Hearing Aids

Yu Ching Chung, Ji Yan Han, Bo Sin Wang, Wei Zhong Zheng, Kung Yao Shen, Ying Hui Lai*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Previous research has shown that auditory and visual inputs are not asynchronous in the human brain, and that visual cues can enhance attention in the hearing process. Therefore, this study proposes audio-visual speech enhancement (SE) with 3D image features (AV-3D-SE) that imitates the auditory process of humans to elevate listening quality. More specifically, AV-3D-SE uses the FlowNet3D model to predict temporal facial motion from the recorded 3D image combining with features for SE applications. The evaluation results showed that the average scores of perceptual evaluation of speech quality and short-time objective intelligibility in 3 dB signal-to-noise ratio increased to 3.229 and 0.914, respectively, while the average hearing aid speech quality index significantly outperformed baseline SE systems (audio-only and audio-visual-2D) in seven typical types of hearing loss with high hearing aid speech perception index. In conclusion, the proposed AV-3D-SE enhances the effectiveness of the SE system and can increase the listening satisfaction of hearing aid users.

Original languageEnglish
Title of host publication2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1131-1137
Number of pages7
ISBN (Electronic)9798350300673
DOIs
StatePublished - 2023
Event2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023 - Taipei, Taiwan
Duration: 31 Oct 20233 Nov 2023

Publication series

Name2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023

Conference

Conference2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023
Country/TerritoryTaiwan
CityTaipei
Period31/10/233/11/23

Keywords

  • deep learning
  • hearing aid and speech enhancement
  • point cloud
  • scene flow

Fingerprint

Dive into the research topics of 'An Audio-Visual Speech Enhancement System Based on 3D Image Features: An Application in Hearing Aids'. Together they form a unique fingerprint.

Cite this