A Multi-Dilation and Multi-Resolution Fully Convolutional Network for Singing Melody Extraction

Ping Gao, Cheng You You, Tai-Shih Chi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

16 Scopus citations

Abstract

Each human cognitive function involves bottom-up and top-down processes. Several methods have been proposed for singing melody extraction by emphasizing either the bottom-up or top-down processes. For hearing, the bottom-up processes include spectral and spectro-temporal decomposition of the sound by the cochlea and the auditory cortex. In this paper, we propose a neural network, which includes spectro-temporal multi-resolution decomposition of the log-spectrogram of the sound and a semantic segmentation model to respectively address the bottom-up and top-down processing of hearing, for singing melody extraction. Simulation results show the proposed model outperforms all previously proposed methods, emphasizing either bottom-up or top-down processing, in almost all objective evaluation metrics.

Original languageEnglish
Title of host publication2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages551-555
Number of pages5
ISBN (Electronic)9781509066315
DOIs
StatePublished - May 2020
Event2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Barcelona, Spain
Duration: 4 May 20208 May 2020

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2020-May
ISSN (Print)1520-6149

Conference

Conference2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
Country/TerritorySpain
CityBarcelona
Period4/05/208/05/20

Keywords

  • fully convolutional network
  • Melody extraction
  • multi-resolution

Fingerprint

Dive into the research topics of 'A Multi-Dilation and Multi-Resolution Fully Convolutional Network for Singing Melody Extraction'. Together they form a unique fingerprint.

Cite this