MoEVC: A Mixture of Experts Voice Conversion System with Sparse Gating Mechanism for Online Computation Acceleration

Yu Tao Chang, Yuan Hong Yang, Yu Huai Peng, Syu Siang Wang, Tai-Shih Chi, Yu Tsao, Hsin Min Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Owing to the recent advancements in deep learning technology, the performance of voice conversion (VC) in terms of quality and similarity has significantly improved. However, complex computation is generally required for deep-learning-based VC systems. This can cause a notable latency, which limits the deployment of such VC systems in real-world applications. Therefore, increasing the efficiency of online computing has become an important task. In this study, we propose a novel mixture-of-experts (MoE) based VC system, termed MoEVC. The MoEVC system uses a gating mechanism to assign weights to feature maps to increase VC performance. In addition, applying sparse constraints on the gating mechanism can skip some convolution processes through elimination of redundant feature maps, thereby accelerating online computing. Experimental results show that by using proper sparse constraints, we can effectively reduce the FLOPs (floating-point operations) count by 70%, while improving VC performance in both objective evaluation and human subjective listening tests.

Original languageEnglish
Title of host publication2021 12th International Symposium on Chinese Spoken Language Processing, ISCSLP 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages5
ISBN (Electronic)9781728169941
DOIs
StatePublished - 24 Jan 2021
Event12th International Symposium on Chinese Spoken Language Processing, ISCSLP 2021 - Hong Kong, Hong Kong
Duration: 24 Jan 202127 Jan 2021

Publication series

Name2021 12th International Symposium on Chinese Spoken Language Processing, ISCSLP 2021

Conference

Conference12th International Symposium on Chinese Spoken Language Processing, ISCSLP 2021
Country/TerritoryHong Kong
CityHong Kong
Period24/01/2127/01/21

Keywords

  • Voice conversion
  • fully convolutional network
  • mixture of experts
  • non-parallel VC
  • variational autoencoder

Fingerprint

Dive into the research topics of 'MoEVC: A Mixture of Experts Voice Conversion System with Sparse Gating Mechanism for Online Computation Acceleration'. Together they form a unique fingerprint.

Cite this