ATTENTION-GUIDED ADAPTATION FOR CODE-SWITCHING SPEECH RECOGNITION

Bobbi Aditya*, Mahdin Rohmatillah*, Liang Hsuan Tai, Jen Tzung Chien*

*此作品的通信作者

研究成果: Conference contribution同行評審

摘要

The prevalence of the powerful multilingual models, such as Whisper, has significantly advanced the researches on speech recognition. However, these models often struggle with handling the code-switching setting, which is essential in multilingual speech recognition. Recent studies have attempted to address this setting by separating the modules for different languages to ensure distinct latent representations for languages. Some other methods considered the switching mechanism based on language identification. In this study, a new attention-guided adaptation is proposed to conduct parameter-efficient learning for bilingual ASR. This method selects those attention heads in a model which closely express language identities and then guided those heads to be correctly attended with their corresponding languages. The experiments on the Mandarin-English code-switching speech corpus show that the proposed approach achieves a 14.2% mixed error rate, surpassing state-of-the-art method, where only 5.6% additional parameters over Whisper are trained.

原文English
主出版物標題2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings
發行者Institute of Electrical and Electronics Engineers Inc.
頁面10256-10260
頁數5
ISBN(電子)9798350344851
DOIs
出版狀態Published - 2024
事件49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Seoul, 韓國
持續時間: 14 4月 202419 4月 2024

出版系列

名字ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN(列印)1520-6149

Conference

Conference49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024
國家/地區韓國
城市Seoul
期間14/04/2419/04/24

指紋

深入研究「ATTENTION-GUIDED ADAPTATION FOR CODE-SWITCHING SPEECH RECOGNITION」主題。共同形成了獨特的指紋。

引用此