Spectro-temporal modulations for robust speech emotion recognition

Lan Ying Yeh*, Tai-Shih Chi

*此作品的通信作者

研究成果: Paper同行評審

12 引文 斯高帕斯(Scopus)

摘要

Speech emotion recognition is mostly considered in clean speech. In this paper, joint spectro-temporal features (RS features) are extracted from an auditory model and are applied to detect the emotion status of noisy speech. The noisy speech is derived from the Berlin Emotional Speech database with added white and babble noises under various SNR levels. The clean train/noisy test scenario is investigated to simulate conditions with unknown noisy sources. The sequential forward floating selection (SFFS) method is adopted to demonstrate the redundancy of RS features and further dimensionality reduction is conducted. Compared to conventional MFCCs plus prosodic features, RS features show higher recognition rates especially in low SNR conditions.

原文English
頁面789-792
頁數4
出版狀態Published - 9月 2010
事件11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010 - Makuhari, Chiba, Japan
持續時間: 26 9月 201030 9月 2010

Conference

Conference11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010
國家/地區Japan
城市Makuhari, Chiba
期間26/09/1030/09/10

指紋

深入研究「Spectro-temporal modulations for robust speech emotion recognition」主題。共同形成了獨特的指紋。

引用此