Adversarial Learning and Augmentation for Speaker Recognition

Jen Tzung Chien, Kang Ting Peng

研究成果: Paper同行評審

8 引文 斯高帕斯(Scopus)


This paper develops a new generative adversarial network (GAN) to artificially generate i-vectors to deal with the issue of unbalanced or insufficient data in speaker recognition based on the probabilistic linear discriminant analysis (PLDA). Data augmentation is performed to improve system robustness over the variations of i-vectors under different number of training utterances. Our idea is to incorporate the class label into GAN which involves a minimax optimization problem for adversarial learning. We build a generator and a discriminator where the class conditional i-vectors are produced by the generator such that the discriminator can not distinguish them as the fake samples. In particular, multiple learning objectives are optimized to build a specialized deep model for model regularization in speaker recognition. In addition to the minimax optimization of adversarial loss, the posterior probabilities of class labels given real and fake samples are maximized. The cosine similarity between real and fake i-vectors is also minimized to preserve the quality of the generated i-vector. The loss functions for data reconstruction and Gaussian regularization in PLDA model are minimized. The experiments illustrate the merit of multi-objective learning for deep adversarial augmentation for speaker recognition.

出版狀態Published - 2018
事件2018 Speaker and Language Recognition Workshop, ODYSSEY 2018 - Les Sables d'Olonne, France
持續時間: 26 6月 201829 6月 2018


Conference2018 Speaker and Language Recognition Workshop, ODYSSEY 2018
城市Les Sables d'Olonne


深入研究「Adversarial Learning and Augmentation for Speaker Recognition」主題。共同形成了獨特的指紋。