摘要
This paper presents a sample-based phone boundary detection algorithm which can improve the accuracy of phone boundary labeling in speech signal. In the conventional phone labeling method adopted the frame-based approach, some acoustic features, like MFCCs, are used. And, the statistical approaches are employed to find the phone boundary based on these frame-based features. The HMM-based forced alignment method is most frequently used method. The main drawback of the frame-based approach lies in incapability of modeling rapid changes in speech signal; moreover, the time resolution of this approach is too coarse for some applications. To overcome this problem, a sample-wise phone boundary detection framework is proposed in this study. First, some sample-wise acoustic features are proposed which can properly model the variation of speech signal. The simple-based spectral KL distance is first employed for boundary candidates pre-selection in order to reduce the complexity of sample-based methods. Then, a supervised neural network is trained for phone boundary detection. Finally, the effectiveness of the proposed framework has been validated on automatic labeling of TCC-300 speech corpus.
原文 | English |
---|---|
頁面 | 137-149 |
頁數 | 13 |
出版狀態 | Published - 2009 |
事件 | 21st Conference on Computational Linguistics and Speech Processing, ROCLING 2009 - Taichung, Taiwan 持續時間: 1 9月 2009 → 2 9月 2009 |
Conference
Conference | 21st Conference on Computational Linguistics and Speech Processing, ROCLING 2009 |
---|---|
國家/地區 | Taiwan |
城市 | Taichung |
期間 | 1/09/09 → 2/09/09 |