Abstract
This paper presents a sample-based phone boundary detection algorithm which can improve the accuracy of phone boundary labeling in speech signal. In the conventional phone labeling method adopted the frame-based approach, some acoustic features, like MFCCs, are used. And, the statistical approaches are employed to find the phone boundary based on these frame-based features. The HMM-based forced alignment method is most frequently used method. The main drawback of the frame-based approach lies in incapability of modeling rapid changes in speech signal; moreover, the time resolution of this approach is too coarse for some applications. To overcome this problem, a sample-wise phone boundary detection framework is proposed in this study. First, some sample-wise acoustic features are proposed which can properly model the variation of speech signal. The simple-based spectral KL distance is first employed for boundary candidates pre-selection in order to reduce the complexity of sample-based methods. Then, a supervised neural network is trained for phone boundary detection. Finally, the effectiveness of the proposed framework has been validated on automatic labeling of TCC-300 speech corpus.
Original language | English |
---|---|
Pages | 137-149 |
Number of pages | 13 |
State | Published - 2009 |
Event | 21st Conference on Computational Linguistics and Speech Processing, ROCLING 2009 - Taichung, Taiwan Duration: 1 Sep 2009 → 2 Sep 2009 |
Conference
Conference | 21st Conference on Computational Linguistics and Speech Processing, ROCLING 2009 |
---|---|
Country/Territory | Taiwan |
City | Taichung |
Period | 1/09/09 → 2/09/09 |
Keywords
- Phone boundary segmentation
- Sample-based spectral KL distance
- Sub-band signal envelope
- Supervised neural network