A Deep Learning Based Approach to Synthesize Intelligible Speech with Limited Temporal Envelope Information

Ching Ju Hsiao, Fei Chen, Ji Yan Han, Wei Zhong Zheng, Ying Hui Lai*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Envelope waveforms can be extracted from multiple frequency bands of a speech signal, and envelope waveforms carry important intelligibility information for human speech communication. This study aimed to investigate whether a deep learning-based model with features of temporal envelope information could synthesize an intelligible speech, and to study the effect of reducing the number (from 8 to 2 in this work) of temporal envelope information on the intelligibility of the synthesized speech. The objective evaluation metric of short-time objective intelligibility (STOI) showed that, on average, the synthesized speech of the proposed approach provided higher STOI (i.e., 0.8) scores in each test condition; and the human listening test showed that the average word correct rate of eight listeners was higher than 97.5%. These findings indicated that the proposed deep learning-based system can be a potential approach to synthesize a highly intelligible speech with limited envelope information in the future.

Original languageEnglish
Title of host publication44th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1972-1976
Number of pages5
ISBN (Electronic)9781728127828
DOIs
StatePublished - 2022
Event44th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2022 - Glasgow, United Kingdom
Duration: 11 Jul 202215 Jul 2022

Publication series

NameProceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS
Volume2022-July
ISSN (Print)1557-170X

Conference

Conference44th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2022
Country/TerritoryUnited Kingdom
CityGlasgow
Period11/07/2215/07/22

Fingerprint

Dive into the research topics of 'A Deep Learning Based Approach to Synthesize Intelligible Speech with Limited Temporal Envelope Information'. Together they form a unique fingerprint.

Cite this