Consonant classification in Mandarin based on the depth image feature: A pilot study

Han Chi Hsieh, Wei Zhong Zheng, Ko Chiang Chen, Ying Hui Lai

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

The consonant is an important element in Mandarin, and various categories of consonant generation effectuate various facial expressions. Specifically, there are changes in facial muscles when speaking, and these changes are closely related to pronunciation; the facial muscles are associated with these hidden articulators, and the effects on the facial changes can be seen as 3D changes. However, in most studies, 2D images are used to analyze facial features when people talk. The 2D images serve to provide information in two dimensions (x- and y-axis); however, subtle deep motions (z-axis changes) of facial muscles when speaking can be difficult to detect accurately. Hence, the depth feature of the face (the point cloud feature in this study) was used to investigate the potential for consonant recognition, recorded by a time-of-flight 3D camera. In this study, we propose an algorithm to recognize the seven categories of Mandarin consonants using the depth features of the speaker's face. The proposed system yielded suitable classification accuracy for the recognition of seven categories of Mandarin consonants. This result implies that depth features can be used for speech-processing applications.

Keywords

  • Consonant classification
  • Deep learning
  • Depth image
  • Point cloud

Fingerprint

Dive into the research topics of 'Consonant classification in Mandarin based on the depth image feature: A pilot study'. Together they form a unique fingerprint.

Cite this