摘要
An auditory-perception inspired singing voice separation algorithm for monaural music recordings is proposed in this paper. Under the framework of computational auditory scene analysis (CASA), the music recordings are first transformed into auditory spectrograms. After extracting the spectral-temporal modulation contents of the time-frequency (T-F) units through a two-stage auditory model, we define modulation features pertaining to three categories in music audio signals: vocal, harmonic, and percussive. The T-F units are then clustered into three categories and the singing voice is synthesized from T-F units in the vocal category via time-frequency masking. The algorithm was tested using the MIR-1K dataset and demonstrated comparable results to other unsupervised masking approaches. Meanwhile, the set of novel features gives a possible explanation on how the auditory cortex analyzes and identifies singing voice in music audio mixtures.
原文 | English |
---|---|
頁面 | 617-622 |
頁數 | 6 |
出版狀態 | Published - 1 1月 2014 |
事件 | 15th International Society for Music Information Retrieval Conference, ISMIR 2014 - Taipei, Taiwan 持續時間: 27 10月 2014 → 31 10月 2014 |
Conference
Conference | 15th International Society for Music Information Retrieval Conference, ISMIR 2014 |
---|---|
國家/地區 | Taiwan |
城市 | Taipei |
期間 | 27/10/14 → 31/10/14 |