Perceptual Characteristics Based Multi-objective Model for Speech Enhancement

Chiang Jen Peng, Yih Liang Shen, Yun Ju Chan, Cheng Yu, Yu Tsao, Tai Shih Chi

研究成果: Conference article同行評審

摘要

Deep learning has been widely adopted for speech applications. Many studies have shown that using the multiple objective framework and learned deep features is effective for improving system performance. In this paper, we propose a perceptual characteristics based multi-objective speech enhancement (SE) algorithm that combines the conventional loss and objective losses of pitch and timbre related features. Timbre related features include frequency modulation (encoded by the pitch contour), amplitude modulation (encoded by the energy contour), and speaker identity. For the speaker identity loss, we consider the deep features derived in a speaker identification system. The proposed algorithm consists of two parts, a LSTM based SE model and CNN based multi-objective models. The objective losses are derived between speech enhanced by the SE model and clean speech and combined with the SE loss for updating the SE model. The proposed algorithm is evaluated using the corpus of Taiwan Mandarin hearing in noise test (TMHINT). Experimental results show the proposed algorithm evidently outperforms the original SE model in all objective scores, including speech quality, speech intelligibility and signal distortion.

原文English
頁(從 - 到)211-215
頁數5
期刊Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
2022-September
DOIs
出版狀態Published - 2022
事件23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of
持續時間: 18 9月 202222 9月 2022

指紋

深入研究「Perceptual Characteristics Based Multi-objective Model for Speech Enhancement」主題。共同形成了獨特的指紋。

引用此