Singing voice separation using spectro-temporal modulation features

Frederick Yen, Yin Jyun Luo, Tai-Shih Chi

研究成果: Paper同行評審

8 引文 斯高帕斯(Scopus)

摘要

An auditory-perception inspired singing voice separation algorithm for monaural music recordings is proposed in this paper. Under the framework of computational auditory scene analysis (CASA), the music recordings are first transformed into auditory spectrograms. After extracting the spectral-temporal modulation contents of the time-frequency (T-F) units through a two-stage auditory model, we define modulation features pertaining to three categories in music audio signals: vocal, harmonic, and percussive. The T-F units are then clustered into three categories and the singing voice is synthesized from T-F units in the vocal category via time-frequency masking. The algorithm was tested using the MIR-1K dataset and demonstrated comparable results to other unsupervised masking approaches. Meanwhile, the set of novel features gives a possible explanation on how the auditory cortex analyzes and identifies singing voice in music audio mixtures.

原文English
頁面617-622
頁數6
出版狀態Published - 1 1月 2014
事件15th International Society for Music Information Retrieval Conference, ISMIR 2014 - Taipei, Taiwan
持續時間: 27 10月 201431 10月 2014

Conference

Conference15th International Society for Music Information Retrieval Conference, ISMIR 2014
國家/地區Taiwan
城市Taipei
期間27/10/1431/10/14

指紋

深入研究「Singing voice separation using spectro-temporal modulation features」主題。共同形成了獨特的指紋。

引用此