Abstract
The consonant is an important element in Mandarin, and various categories of consonant generation effectuate various facial expressions. Specifically, there are changes in facial muscles when speaking, and these changes are closely related to pronunciation; the facial muscles are associated with these hidden articulators, and the effects on the facial changes can be seen as 3D changes. However, in most studies, 2D images are used to analyze facial features when people talk. The 2D images serve to provide information in two dimensions (x- and y-axis); however, subtle deep motions (z-axis changes) of facial muscles when speaking can be difficult to detect accurately. Hence, the depth feature of the face (the point cloud feature in this study) was used to investigate the potential for consonant recognition, recorded by a time-of-flight 3D camera. In this study, we propose an algorithm to recognize the seven categories of Mandarin consonants using the depth features of the speaker's face. The proposed system yielded suitable classification accuracy for the recognition of seven categories of Mandarin consonants. This result implies that depth features can be used for speech-processing applications.
Original language | English |
---|---|
Pages (from-to) | 2300-2304 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Volume | 2019-September |
DOIs | |
State | Published - 2019 |
Event | 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 - Graz, Austria Duration: 15 Sep 2019 → 19 Sep 2019 |
Keywords
- Consonant classification
- Deep learning
- Depth image
- Point cloud