Multi-Domain Emotion Recognition Enhancement: A Novel Domain Adaptation Technique for Speech-Emotion Recognition

Ammar Amjad, Sucharita Khuntia, Hsien Tsung Chang, Li Chia Tai

Research output: Contribution to journalArticlepeer-review

Abstract

As artificial intelligence advances, speech-emotion recognition (SER) has become a critical research area. Traditional SER methods typically rely on homogeneous domain data for training and testing. This practice requires adaptation when confronted with real-world data's heterogeneous linguistic, methodological, and speaker-related attributes. These variances can degrade the accuracy and generalization of SER models. To address this gap, we introduce a novel domain adaptation technique, multi domain emotion recognition enhancement (MDERE), which utilizes a non-negative matrix to reduce the inflexibility of the conventional binary label matrix for source domain data. This process yields a label matrix that better adapts to the nuances of the source labels while preserving their original structure. This framework refines SER methods by fine-tuning a transformation matrix for enhanced emotion discrimination. Elastic net regularization, which combines L1 and L2 penalties, enriches the transformation matrix, selectively emphasizing relevant features to enhance the robustness of emotion detection. The framework constructs customized similarity and dissimilarity graphs to reconcile the differences between source and target domains, enabling nuanced cross-domain data analysis. Extensive testing on multiple cross-domain SER tasks has shown that MDERE substantially improves recognition accuracy, surpassing several state-of-the-art algorithms. These results demonstrate MDERE's ability to effectively align domain variations enhances the generalizability of SER systems.

Keywords

  • Domain adaptation techniques
  • Feature transformation matrix
  • Label matrix refinement
  • Non-negative matrix factorization
  • Speech emotion recognition

Fingerprint

Dive into the research topics of 'Multi-Domain Emotion Recognition Enhancement: A Novel Domain Adaptation Technique for Speech-Emotion Recognition'. Together they form a unique fingerprint.

Cite this