TY - JOUR
T1 - Mandarin Electrolaryngeal Speech Voice Conversion using Cross-domain Features
AU - Chen, Hsin Hao
AU - Chien, Yung Lun
AU - Yen, Ming Chi
AU - Tsai, Shu Wei
AU - Tsao, Yu
AU - Chi, Tai Shih
AU - Wang, Hsin Min
N1 - Publisher Copyright:
© 2023 International Speech Communication Association. All rights reserved.
PY - 2023
Y1 - 2023
N2 - Patients who have had their entire larynx removed, including the vocal folds, owing to throat cancer may experience difficulties in speaking. In such cases, electrolarynx devices are often prescribed to produce speech, which is commonly referred to as electrolaryngeal speech (EL speech). However, the quality and intelligibility of EL speech are poor. To address this problem, EL voice conversion (ELVC) is a method used to improve the intelligibility and quality of EL speech. In this paper, we propose a novel ELVC system that incorporates cross-domain features, specifically spectral features and self-supervised learning (SSL) embeddings. The experimental results show that applying cross-domain features can notably improve the conversion performance for the ELVC task compared with utilizing only traditional spectral features.
AB - Patients who have had their entire larynx removed, including the vocal folds, owing to throat cancer may experience difficulties in speaking. In such cases, electrolarynx devices are often prescribed to produce speech, which is commonly referred to as electrolaryngeal speech (EL speech). However, the quality and intelligibility of EL speech are poor. To address this problem, EL voice conversion (ELVC) is a method used to improve the intelligibility and quality of EL speech. In this paper, we propose a novel ELVC system that incorporates cross-domain features, specifically spectral features and self-supervised learning (SSL) embeddings. The experimental results show that applying cross-domain features can notably improve the conversion performance for the ELVC task compared with utilizing only traditional spectral features.
KW - electrolaryngeal speech
KW - self-supervised learning
KW - voice conversion
UR - http://www.scopus.com/inward/record.url?scp=85171540726&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2023-786
DO - 10.21437/Interspeech.2023-786
M3 - Conference article
AN - SCOPUS:85171540726
SN - 2308-457X
VL - 2023-August
SP - 5018
EP - 5022
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 24th International Speech Communication Association, Interspeech 2023
Y2 - 20 August 2023 through 24 August 2023
ER -