TY - JOUR
T1 - Multi-Domain Emotion Recognition Enhancement
T2 - A Novel Domain Adaptation Technique for Speech-Emotion Recognition
AU - Amjad, Ammar
AU - Khuntia, Sucharita
AU - Chang, Hsien Tsung
AU - Tai, Li Chia
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2024
Y1 - 2024
N2 - As artificial intelligence advances, speech-emotion recognition (SER) has become a critical research area. Traditional SER methods typically rely on homogeneous domain data for training and testing. This practice requires adaptation when confronted with real-world data's heterogeneous linguistic, methodological, and speaker-related attributes. These variances can degrade the accuracy and generalization of SER models. To address this gap, we introduce a novel domain adaptation technique, multi domain emotion recognition enhancement (MDERE), which utilizes a non-negative matrix to reduce the inflexibility of the conventional binary label matrix for source domain data. This process yields a label matrix that better adapts to the nuances of the source labels while preserving their original structure. This framework refines SER methods by fine-tuning a transformation matrix for enhanced emotion discrimination. Elastic net regularization, which combines L1 and L2 penalties, enriches the transformation matrix, selectively emphasizing relevant features to enhance the robustness of emotion detection. The framework constructs customized similarity and dissimilarity graphs to reconcile the differences between source and target domains, enabling nuanced cross-domain data analysis. Extensive testing on multiple cross-domain SER tasks has shown that MDERE substantially improves recognition accuracy, surpassing several state-of-the-art algorithms. These results demonstrate MDERE's ability to effectively align domain variations enhances the generalizability of SER systems.
AB - As artificial intelligence advances, speech-emotion recognition (SER) has become a critical research area. Traditional SER methods typically rely on homogeneous domain data for training and testing. This practice requires adaptation when confronted with real-world data's heterogeneous linguistic, methodological, and speaker-related attributes. These variances can degrade the accuracy and generalization of SER models. To address this gap, we introduce a novel domain adaptation technique, multi domain emotion recognition enhancement (MDERE), which utilizes a non-negative matrix to reduce the inflexibility of the conventional binary label matrix for source domain data. This process yields a label matrix that better adapts to the nuances of the source labels while preserving their original structure. This framework refines SER methods by fine-tuning a transformation matrix for enhanced emotion discrimination. Elastic net regularization, which combines L1 and L2 penalties, enriches the transformation matrix, selectively emphasizing relevant features to enhance the robustness of emotion detection. The framework constructs customized similarity and dissimilarity graphs to reconcile the differences between source and target domains, enabling nuanced cross-domain data analysis. Extensive testing on multiple cross-domain SER tasks has shown that MDERE substantially improves recognition accuracy, surpassing several state-of-the-art algorithms. These results demonstrate MDERE's ability to effectively align domain variations enhances the generalizability of SER systems.
KW - Domain adaptation techniques
KW - Feature transformation matrix
KW - Label matrix refinement
KW - Non-negative matrix factorization
KW - Speech emotion recognition
UR - http://www.scopus.com/inward/record.url?scp=85209759387&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2024.3498694
DO - 10.1109/TASLP.2024.3498694
M3 - Article
AN - SCOPUS:85209759387
SN - 2329-9290
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
ER -