TY - GEN
T1 - ATTENTION-GUIDED ADAPTATION FOR CODE-SWITCHING SPEECH RECOGNITION
AU - Aditya, Bobbi
AU - Rohmatillah, Mahdin
AU - Tai, Liang Hsuan
AU - Chien, Jen Tzung
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - The prevalence of the powerful multilingual models, such as Whisper, has significantly advanced the researches on speech recognition. However, these models often struggle with handling the code-switching setting, which is essential in multilingual speech recognition. Recent studies have attempted to address this setting by separating the modules for different languages to ensure distinct latent representations for languages. Some other methods considered the switching mechanism based on language identification. In this study, a new attention-guided adaptation is proposed to conduct parameter-efficient learning for bilingual ASR. This method selects those attention heads in a model which closely express language identities and then guided those heads to be correctly attended with their corresponding languages. The experiments on the Mandarin-English code-switching speech corpus show that the proposed approach achieves a 14.2% mixed error rate, surpassing state-of-the-art method, where only 5.6% additional parameters over Whisper are trained.
AB - The prevalence of the powerful multilingual models, such as Whisper, has significantly advanced the researches on speech recognition. However, these models often struggle with handling the code-switching setting, which is essential in multilingual speech recognition. Recent studies have attempted to address this setting by separating the modules for different languages to ensure distinct latent representations for languages. Some other methods considered the switching mechanism based on language identification. In this study, a new attention-guided adaptation is proposed to conduct parameter-efficient learning for bilingual ASR. This method selects those attention heads in a model which closely express language identities and then guided those heads to be correctly attended with their corresponding languages. The experiments on the Mandarin-English code-switching speech corpus show that the proposed approach achieves a 14.2% mixed error rate, surpassing state-of-the-art method, where only 5.6% additional parameters over Whisper are trained.
KW - Attention guidance
KW - bilingual speech recognition
KW - code-switching
KW - parameter efficiency
UR - http://www.scopus.com/inward/record.url?scp=85195423245&partnerID=8YFLogxK
U2 - 10.1109/ICASSP48485.2024.10446258
DO - 10.1109/ICASSP48485.2024.10446258
M3 - Conference contribution
AN - SCOPUS:85195423245
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 10256
EP - 10260
BT - 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024
Y2 - 14 April 2024 through 19 April 2024
ER -