摘要
In this paper, we propose intra- and inter-feature orthogonal fusion (IIOF) of local and global features obtained from MS-SincResNet or MS-SSincResNet (a variant of MS-SincResNet) for music emotion recognition (MER). Given a raw waveform of music signal, MS-SincResNet/MS-SSincResNet is first used to learn several 2D representations having different receptive fields and obtain embeddings with time-frequency information from different layers. Then, local and global features are extracted from these embeddings. IIOF consisting of intra-feature OF and inter-feature OF is further employed to integrate both local and global features to obtain a discriminative descriptor for MER. The intra-feature OF is used to enhance the diversity of the global feature, and the inter-feature OF is utilized to reduce redundancies and produce complementary information between local and global features. The experimental results have demonstrated that the representation discriminability can be enhanced by IIOF considering the feature orthogonality. Furthermore, extensive experimental results have shown that the proposed method outperforms other state-of-the-art methods in terms of regression and classification tasks on the well-known MER datasets, including the DEAM dataset and the PMEmo dataset. The codes are available at https://github.com/PeiChunChang/MS-SSincResNet_with_IIOF.
原文 | English |
---|---|
文章編號 | 110200 |
期刊 | Pattern Recognition |
卷 | 148 |
DOIs | |
出版狀態 | Published - 4月 2024 |