Face-based Voice Conversion: Learning the Voice behind a Face

Hsiao Han Lu, Shao En Weng, Ya Fan Yen, Hong-Han Shuai, Wen-Huang Cheng

研究成果: Conference contribution同行評審

6 引文 斯高帕斯(Scopus)

摘要

Zero-shot voice conversion (VC) trained by non-parallel data has gained a lot of attention in recent years. Previous methods usually extract speaker embeddings from audios and use them for converting the voices into different voice styles. Since there is a strong relationship between human faces and voices, a promising approach would be to synthesize various voice characteristics from face representation. Therefore, we introduce a novel idea of generating different voice styles from different human face photos, which can facilitate new applications, e.g., personalized voice assistants. However, the audio-visual relationship is implicit. Moreover, the existing VCs are trained on laboratory-collected datasets without speaker photos, while the datasets with both photos and audios are in-the-wild datasets. Directly replacing the target audio with the target photo and training on the in-the-wild dataset leads to noisy results. To address these issues, we propose a novel many-to-many voice conversion network, namely Face-based Voice Conversion (FaceVC), with a 3-stage training strategy. Quantitative and qualitative experiments on the LRS3-Ted dataset show that the proposed FaceVC successfully performs voice conversion according to the target face photos. Audio samples can be found on the demo website at https://facevc.github.io/.

原文English
主出版物標題MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia
發行者Association for Computing Machinery, Inc
頁面496-505
頁數10
ISBN(電子)9781450386517
DOIs
出版狀態Published - 17 10月 2021
事件29th ACM International Conference on Multimedia, MM 2021 - Virtual, Online, 中國
持續時間: 20 10月 202124 10月 2021

出版系列

名字MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia

Conference

Conference29th ACM International Conference on Multimedia, MM 2021
國家/地區中國
城市Virtual, Online
期間20/10/2124/10/21

指紋

深入研究「Face-based Voice Conversion: Learning the Voice behind a Face」主題。共同形成了獨特的指紋。

引用此