A two-stage singing voice separation algorithm using spectro-temporal modulation features

Frederick Z. Yen, Mao Chang Huang, Tai-Shih Chi

研究成果: Conference article同行評審

4 引文 斯高帕斯(Scopus)

摘要

A two-stage singing voice separation algorithm using spectrotemporal modulation features is proposed in this paper. First, music clips are transformed into auditory spectrograms and the spectral-temporal modulation contents of all time-frequency (T-F) units of the auditory spectrograms are extracted using an auditory model. Then, T-F units are sequentially clustered using the expectation-maximization (EM) algorithm into percussive, harmonic and vocal units through the proposed two-stage algorithm. Lastly, the singing voice is synthesized from clustered vocal T-F units via time-frequency masking. The algorithm was evaluated using the MIR-1K dataset and demonstrated better separation results than our previously proposed one-stage algorithm.

原文English
頁(從 - 到)3321-3324
頁數4
期刊Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
2015-January
DOIs
出版狀態Published - 9月 2015
事件16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, 德國
持續時間: 6 9月 201510 9月 2015

指紋

深入研究「A two-stage singing voice separation algorithm using spectro-temporal modulation features」主題。共同形成了獨特的指紋。

引用此