TY - JOUR
T1 - Enhancing the Performance of Pathological Voice Quality Assessment System Through the Attention-Mechanism Based Neural Network
AU - Han, Ji Yan
AU - Hsiao, Ching Ju
AU - Zheng, Wei Zhong
AU - Weng, Ko Cheng
AU - Ho, Guan Min
AU - Chang, Chia Yuan
AU - Wang, Chi Te
AU - Fang, Shih Hau
AU - Lai, Ying Hui
N1 - Publisher Copyright:
© 2023 The Voice Foundation
PY - 2023
Y1 - 2023
N2 - Objective: Doctors, nowadays, primarily use auditory-perceptual evaluation, such as the grade, roughness, breathiness, asthenia, and strain scale, to evaluate voice quality and determine the treatment. However, the results predicted by individual physicians often differ, because of subjective perceptions, and diagnosis time interval, if the patient's symptoms are hard to judge. Therefore, an accurate computerized pathological voice quality assessment system will improve the quality of assessment. Method: This study proposes a self_attention-based system, with a deep learning technology, named self_attention-based bidirectional long-short term memory (SA BiLSTM). Different pitches [low, normal, high], and vowels [/a/, /i/, /u/], were added into the proposed model, to make it learn how professional doctors evaluate the grade, roughness, breathiness, asthenia, and strain scale, in a high dimension view. Results: The experimental results showed that the proposed system provided higher performance than the baseline system. More specifically, the macro average of the F1 score, presented as decimal, was used to compare the accuracy of classification. The (G, R, and B) of the proposed system were (0.768±0.011, 0.820±0.009, and 0.815±0.009), which is higher than the baseline systems: deep neural network (0.395±0.010, 0.312±0.019, 0.321±0.014) and convolution neural network (0.421±0.052, 0.306±0.043, 0.3250±0.032) respectively. Conclusions: The proposed system, with SA BiLSTM, pitches, and vowels, provides a more accurate way to evaluate the voice. This will be helpful for clinical voice evaluations and will improve patients’ benefits from voice therapy.
AB - Objective: Doctors, nowadays, primarily use auditory-perceptual evaluation, such as the grade, roughness, breathiness, asthenia, and strain scale, to evaluate voice quality and determine the treatment. However, the results predicted by individual physicians often differ, because of subjective perceptions, and diagnosis time interval, if the patient's symptoms are hard to judge. Therefore, an accurate computerized pathological voice quality assessment system will improve the quality of assessment. Method: This study proposes a self_attention-based system, with a deep learning technology, named self_attention-based bidirectional long-short term memory (SA BiLSTM). Different pitches [low, normal, high], and vowels [/a/, /i/, /u/], were added into the proposed model, to make it learn how professional doctors evaluate the grade, roughness, breathiness, asthenia, and strain scale, in a high dimension view. Results: The experimental results showed that the proposed system provided higher performance than the baseline system. More specifically, the macro average of the F1 score, presented as decimal, was used to compare the accuracy of classification. The (G, R, and B) of the proposed system were (0.768±0.011, 0.820±0.009, and 0.815±0.009), which is higher than the baseline systems: deep neural network (0.395±0.010, 0.312±0.019, 0.321±0.014) and convolution neural network (0.421±0.052, 0.306±0.043, 0.3250±0.032) respectively. Conclusions: The proposed system, with SA BiLSTM, pitches, and vowels, provides a more accurate way to evaluate the voice. This will be helpful for clinical voice evaluations and will improve patients’ benefits from voice therapy.
KW - Auditory-perceptual evaluation—Deep learning—GRBAS scale—Improved clinical assessment—Self_attention—Voice disorder
UR - http://www.scopus.com/inward/record.url?scp=85147210205&partnerID=8YFLogxK
U2 - 10.1016/j.jvoice.2022.12.026
DO - 10.1016/j.jvoice.2022.12.026
M3 - Article
AN - SCOPUS:85147210205
SN - 0892-1997
JO - Journal of Voice
JF - Journal of Voice
ER -