Abstract
A sample-based phone boundary detection algorithm is proposed in this paper. Some sample-based acoustic parameters are first extracted in the proposed method, including six sub-band signal envelopes, sample-based KL distance and spectral entropy. Then, the sample-based KL distance is used for boundary candidates preselection. Last, a supervised neural network is employed for final boundary detection. Experimental results using the TIMIT speech corpus showed that EERs of 13.2% and 15.1% were achieved for the training and test data sets, respectively. Moreover, 43.5% and 88.2% of boundaries detected were within 80- and 240-sample error tolerance from manual labeling results at the EER operating point.
Original language | English |
---|---|
Pages | 1397-1400 |
Number of pages | 4 |
State | Published - Sep 2010 |
Event | 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010 - Makuhari, Chiba, Japan Duration: 26 Sep 2010 → 30 Sep 2010 |
Conference
Conference | 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010 |
---|---|
Country/Territory | Japan |
City | Makuhari, Chiba |
Period | 26/09/10 → 30/09/10 |
Keywords
- Speech analysis
- Speech segmentation