Multi-Resolution Singing Voice Separation

Yih Liang Shen, Ya Ching Lai, Tai Shih Chi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

It has been shown that time-domain neural networks achieve higher performance than networks working on the short-time Fourier transform (STFT) domain. The fully-convolutional time-domain audio separation network (Conv-TasNet) is an end-to-end separation model with outstanding performance. However, the fixed convolution kernel length in Conv-TasNet implies that it analyzes signals using the frequency resolution constrained by the kernel length. This paper proposes a multi-frequency-resolution (MF) architecture, which analyzes sig-nals using more frequency resolutions, and compares the MF model with Conv-TasNet on singing voice separation. The results show that the MF architecture improves performance of Conv-TasNet. In addition, we also demonstrate the MF architecture does not provide consistent benefits to the STFT-domain separation model.

Original languageEnglish
Title of host publication2024 27th Conference on the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2024 - Proceedings
EditorsMing-Hsiang Su, Jui-Feng Yeh, Yuan-Fu Liao, Chi-Chun Lee, Yu Taso
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331506032
DOIs
StatePublished - 2024
Event27th Conference on the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2024 - Hsinchu, Taiwan
Duration: 17 Oct 202419 Oct 2024

Publication series

Name2024 27th Conference on the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2024 - Proceedings

Conference

Conference27th Conference on the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2024
Country/TerritoryTaiwan
CityHsinchu
Period17/10/2419/10/24

Keywords

  • Convolution neural network
  • end-to-end learning
  • multi-resolution
  • singing voice separation

Fingerprint

Dive into the research topics of 'Multi-Resolution Singing Voice Separation'. Together they form a unique fingerprint.

Cite this