Perceptual Characteristics Based Multi-objective Model for Speech Enhancement

Chiang Jen Peng, Yih Liang Shen, Yun Ju Chan, Cheng Yu, Yu Tsao, Tai Shih Chi

Research output: Contribution to journalConference articlepeer-review

3 Scopus citations


Deep learning has been widely adopted for speech applications. Many studies have shown that using the multiple objective framework and learned deep features is effective for improving system performance. In this paper, we propose a perceptual characteristics based multi-objective speech enhancement (SE) algorithm that combines the conventional loss and objective losses of pitch and timbre related features. Timbre related features include frequency modulation (encoded by the pitch contour), amplitude modulation (encoded by the energy contour), and speaker identity. For the speaker identity loss, we consider the deep features derived in a speaker identification system. The proposed algorithm consists of two parts, a LSTM based SE model and CNN based multi-objective models. The objective losses are derived between speech enhanced by the SE model and clean speech and combined with the SE loss for updating the SE model. The proposed algorithm is evaluated using the corpus of Taiwan Mandarin hearing in noise test (TMHINT). Experimental results show the proposed algorithm evidently outperforms the original SE model in all objective scores, including speech quality, speech intelligibility and signal distortion.

Original languageEnglish
Pages (from-to)211-215
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
StatePublished - 2022
Event23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of
Duration: 18 Sep 202222 Sep 2022


  • deep feature
  • multi-objective model
  • perceptual characteristics
  • Speech enhancement
  • timbre


Dive into the research topics of 'Perceptual Characteristics Based Multi-objective Model for Speech Enhancement'. Together they form a unique fingerprint.

Cite this