ASGAN-VC: One-Shot Voice Conversion with Additional Style Embedding and Generative Adversarial Networks

Wei Cheng Li*, Tzer Jen Wei

*此作品的通信作者

研究成果: Conference contribution同行評審

3 引文 斯高帕斯(Scopus)

摘要

In this paper, we present a voice conversion system that improves the quality of generated voice and its similarity to the target voice style significantly. Many VC systems use feature-disentangle-based learning techniques to separate speakers' voices from their linguistic content in order to translate a voice into another style. This is the approach we are taking. To prevent speaker-style information from obscuring the content embedding, some previous works quantize or reduce the dimension of the embedding. However, an imperfect disentanglement would damage the quality and similarity of the sound. In this paper, to further improve quality and similarity in voice conversion, we propose a novel style transfer method within an autoencoder-based VC system that involves generative adversarial training. The conversion process was objectively evaluated using the fair third-party speaker verification system, the results shows that ASGAN-VC outperforms VQVC + and AGAINVC in terms of speaker similarity. A subjectively observing that our proposal outperformed the VQVC + and AGAINVC in terms of naturalness and speaker similarity.

原文English
主出版物標題Proceedings of 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022
發行者Institute of Electrical and Electronics Engineers Inc.
頁面1932-1937
頁數6
ISBN(電子)9786165904773
DOIs
出版狀態Published - 2022
事件2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022 - Chiang Mai, 泰國
持續時間: 7 11月 202210 11月 2022

出版系列

名字Proceedings of 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022

Conference

Conference2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022
國家/地區泰國
城市Chiang Mai
期間7/11/2210/11/22

指紋

深入研究「ASGAN-VC: One-Shot Voice Conversion with Additional Style Embedding and Generative Adversarial Networks」主題。共同形成了獨特的指紋。

引用此