TY - JOUR
T1 - The synthesis unit generation algorithm for Mandarin TTS
AU - Hwang, Shaw-Hwa
AU - Yei, Cheng Yu
PY - 2002/7/11
Y1 - 2002/7/11
N2 - The "synthesis unit generation" and "co-articulation effect elimination" in Mandarin text-to-speech system will be studied and implemented in this paper. In the "synthesis unit generation" approach, the segmental DTW algorithm is employed to estimate the representative spectrum template from a large speech database. Moreover, the listening test method is employed to select the good waveform template. Finally, the spectrum and waveform templates are put into the segmental DTW architecture to generate the synthesis unit. The synthesis unit consists of the waveform data and prosody unit. The waveform data will make the synthesized speech sounds clear and the embedded prosody unit will help to generate more natural speech. In the "co-articulation effect elimination" approach, the synthesis unit with large size is selected to alleviate the co-articulation effect. Moreover, a large speech database is investigated and analyzed to infer some coarticulation rules. Then the cross-fading function that is controlled by these co-articulation rules is employed to smoothing the energy and spectrum between each two synthesis units.
AB - The "synthesis unit generation" and "co-articulation effect elimination" in Mandarin text-to-speech system will be studied and implemented in this paper. In the "synthesis unit generation" approach, the segmental DTW algorithm is employed to estimate the representative spectrum template from a large speech database. Moreover, the listening test method is employed to select the good waveform template. Finally, the spectrum and waveform templates are put into the segmental DTW architecture to generate the synthesis unit. The synthesis unit consists of the waveform data and prosody unit. The waveform data will make the synthesized speech sounds clear and the embedded prosody unit will help to generate more natural speech. In the "co-articulation effect elimination" approach, the synthesis unit with large size is selected to alleviate the co-articulation effect. Moreover, a large speech database is investigated and analyzed to infer some coarticulation rules. Then the cross-fading function that is controlled by these co-articulation rules is employed to smoothing the energy and spectrum between each two synthesis units.
UR - http://www.scopus.com/inward/record.url?scp=17344390587&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2002.5743753
DO - 10.1109/ICASSP.2002.5743753
M3 - Conference article
AN - SCOPUS:17344390587
SN - 1520-6149
VL - 1
JO - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
JF - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
T2 - 2002 IEEE International Conference on Acustics, Speech, and Signal Processing
Y2 - 13 May 2002 through 17 May 2002
ER -