CNN Based Two-stage Multi-resolution End-to-end Model for Singing Melody Extraction

Ming Tso Chen, Bo Jun Li, Tai-Shih Chi

研究成果: Conference contribution同行評審

21 引文 斯高帕斯(Scopus)

摘要

Inspired by human hearing perception, we propose a two-stage multi-resolution end-to-end model for singing melody extraction in this paper. The convolutional neural network (CNN) is the core of the proposed model to generate multi-resolution representations. The 1-D and 2-D multi-resolution analysis on waveform and spectrogram-like graph are successively carried out by using 1-D and 2-D CNN kernels of different lengths and sizes. The 1-D CNNs with kernels of different lengths produce multi-resolution spectrogram-like graphs without suffering from the trade-off between spectral and temporal resolutions. The 2-D CNNs with kernels of different sizes extract features from spectro-temporal envelopes of different scales. Experiment results show the proposed model outperforms three compared systems in three out of five public databases.

原文English
主出版物標題2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings
發行者Institute of Electrical and Electronics Engineers Inc.
頁面1005-1009
頁數5
ISBN(電子)9781479981311
DOIs
出版狀態Published - 1 5月 2019
事件44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Brighton, 英國
持續時間: 12 5月 201917 5月 2019

出版系列

名字ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2019-May
ISSN(列印)1520-6149

Conference

Conference44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
國家/地區英國
城市Brighton
期間12/05/1917/05/19

指紋

深入研究「CNN Based Two-stage Multi-resolution End-to-end Model for Singing Melody Extraction」主題。共同形成了獨特的指紋。

引用此