ASYMMETRIC CLEAN SEGMENTS-GUIDED SELF-SUPERVISED LEARNING FOR ROBUST SPEAKER VERIFICATION

Chong Xin Gan, Man Wai Mak, Weiwei Lin, Jen Tzung Chien

研究成果: Conference contribution同行評審

摘要

Contrastive self-supervised learning (CSL) for speaker verification (SV) has drawn increasing interest recently due to its ability to exploit unlabeled data. Performing data augmentation on raw waveforms, such as adding noise or reverberation, plays a pivotal role in achieving promising results in SV. Data augmentation, however, demands meticulous calibration to ensure intact speaker-specific information, which is difficult to achieve without speaker labels. To address this issue, we introduce a novel framework by incorporating clean and augmented segments into the contrastive training pipeline. The clean segments are repurposed to pair with noisy segments to form additional positive and negative pairs. Moreover, the contrastive loss is weighted to increase the difference between the clean and augmented embeddings of different speakers. Experimental results on Voxceleb1 suggest that the proposed framework can achieve a remarkable 19% improvement over the conventional methods, and it surpasses many existing state-of-the-art techniques.

原文English
主出版物標題2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings
發行者Institute of Electrical and Electronics Engineers Inc.
頁面11081-11085
頁數5
ISBN(電子)9798350344851
DOIs
出版狀態Published - 2024
事件49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Seoul, 韓國
持續時間: 14 4月 202419 4月 2024

出版系列

名字ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN(列印)1520-6149

Conference

Conference49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024
國家/地區韓國
城市Seoul
期間14/04/2419/04/24

指紋

深入研究「ASYMMETRIC CLEAN SEGMENTS-GUIDED SELF-SUPERVISED LEARNING FOR ROBUST SPEAKER VERIFICATION」主題。共同形成了獨特的指紋。

引用此