Spectro-temporal modulations for robust speech emotion recognition

Lan Ying Yeh*, Tai-Shih Chi

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

12 Scopus citations

Abstract

Speech emotion recognition is mostly considered in clean speech. In this paper, joint spectro-temporal features (RS features) are extracted from an auditory model and are applied to detect the emotion status of noisy speech. The noisy speech is derived from the Berlin Emotional Speech database with added white and babble noises under various SNR levels. The clean train/noisy test scenario is investigated to simulate conditions with unknown noisy sources. The sequential forward floating selection (SFFS) method is adopted to demonstrate the redundancy of RS features and further dimensionality reduction is conducted. Compared to conventional MFCCs plus prosodic features, RS features show higher recognition rates especially in low SNR conditions.

Original languageEnglish
Pages789-792
Number of pages4
StatePublished - Sep 2010
Event11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010 - Makuhari, Chiba, Japan
Duration: 26 Sep 201030 Sep 2010

Conference

Conference11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010
Country/TerritoryJapan
CityMakuhari, Chiba
Period26/09/1030/09/10

Keywords

  • Emotion recognition
  • Robust
  • Spectro-temporal modulations

Fingerprint

Dive into the research topics of 'Spectro-temporal modulations for robust speech emotion recognition'. Together they form a unique fingerprint.

Cite this