TY - JOUR
T1 - Deep learning application for vocal fold disease prediction through voice recognition
T2 - Preliminary development study
AU - Hu, Hao Chun
AU - Chang, Shyue Yih
AU - Wang, Chuen Heng
AU - Li, Kai Jun
AU - Cho, Hsiao Yun
AU - Chen, Yi Ting
AU - Lu, Chang Jung
AU - Tsai, Tzu Pei
AU - Lee, Oscar Kuang Sheng
N1 - Publisher Copyright:
© 2021 Journal of Medical Internet Research. All rights reserved.
PY - 2021/6
Y1 - 2021/6
N2 - Background: Dysphonia influences the quality of life by interfering with communication. However, a laryngoscopic examination is expensive and not readily accessible in primary care units. Experienced laryngologists are required to achieve an accurate diagnosis. Objective: This study sought to detect various vocal fold diseases through pathological voice recognition using artificial intelligence. Methods: We collected 189 normal voice samples and 552 samples of individuals with voice disorders, including vocal atrophy (n=224), unilateral vocal paralysis (n=50), organic vocal fold lesions (n=248), and adductor spasmodic dysphonia (n=30). The 741 samples were divided into 2 sets: 593 samples as the training set and 148 samples as the testing set. A convolutional neural network approach was applied to train the model, and findings were compared with those of human specialists. Results: The convolutional neural network model achieved a sensitivity of 0.66, a specificity of 0.91, and an overall accuracy of 66.9% for distinguishing normal voice, vocal atrophy, unilateral vocal paralysis, organic vocal fold lesions, and adductor spasmodic dysphonia. Compared with the accuracy of human specialists, the overall accuracy rates were 60.1% and 56.1% for the 2 laryngologists and 51.4% and 43.2% for the 2 general ear, nose, and throat doctors. Conclusions: Voice alone could be used for common vocal fold disease recognition through a deep learning approach after training with our Mandarin pathological voice database. This approach involving artificial intelligence could be clinically useful for screening general vocal fold disease using the voice. The approach includes a quick survey and a general health examination. It can be applied during telemedicine in areas with primary care units lacking laryngoscopic abilities. It could support physicians when prescreening cases by allowing for invasive examinations to be performed only for cases involving problems with automatic recognition or listening and for professional analyses of other clinical examination results that reveal doubts about the presence of pathologies.
AB - Background: Dysphonia influences the quality of life by interfering with communication. However, a laryngoscopic examination is expensive and not readily accessible in primary care units. Experienced laryngologists are required to achieve an accurate diagnosis. Objective: This study sought to detect various vocal fold diseases through pathological voice recognition using artificial intelligence. Methods: We collected 189 normal voice samples and 552 samples of individuals with voice disorders, including vocal atrophy (n=224), unilateral vocal paralysis (n=50), organic vocal fold lesions (n=248), and adductor spasmodic dysphonia (n=30). The 741 samples were divided into 2 sets: 593 samples as the training set and 148 samples as the testing set. A convolutional neural network approach was applied to train the model, and findings were compared with those of human specialists. Results: The convolutional neural network model achieved a sensitivity of 0.66, a specificity of 0.91, and an overall accuracy of 66.9% for distinguishing normal voice, vocal atrophy, unilateral vocal paralysis, organic vocal fold lesions, and adductor spasmodic dysphonia. Compared with the accuracy of human specialists, the overall accuracy rates were 60.1% and 56.1% for the 2 laryngologists and 51.4% and 43.2% for the 2 general ear, nose, and throat doctors. Conclusions: Voice alone could be used for common vocal fold disease recognition through a deep learning approach after training with our Mandarin pathological voice database. This approach involving artificial intelligence could be clinically useful for screening general vocal fold disease using the voice. The approach includes a quick survey and a general health examination. It can be applied during telemedicine in areas with primary care units lacking laryngoscopic abilities. It could support physicians when prescreening cases by allowing for invasive examinations to be performed only for cases involving problems with automatic recognition or listening and for professional analyses of other clinical examination results that reveal doubts about the presence of pathologies.
KW - Artificial intelligence
KW - Convolutional neural network
KW - Dysphonia
KW - Pathological voice
KW - Vocal fold disease
KW - Voice pathology identification
UR - http://www.scopus.com/inward/record.url?scp=85107843181&partnerID=8YFLogxK
U2 - 10.2196/25247
DO - 10.2196/25247
M3 - Article
C2 - 34100770
AN - SCOPUS:85107843181
SN - 1438-8871
VL - 23
JO - Journal of Medical Internet Research
JF - Journal of Medical Internet Research
IS - 6
M1 - e25247
ER -