Adversarial Learning and Augmentation for Speaker Recognition

Jen Tzung Chien, Kang Ting Peng

Research output: Contribution to conferencePaperpeer-review

9 Scopus citations

Abstract

This paper develops a new generative adversarial network (GAN) to artificially generate i-vectors to deal with the issue of unbalanced or insufficient data in speaker recognition based on the probabilistic linear discriminant analysis (PLDA). Data augmentation is performed to improve system robustness over the variations of i-vectors under different number of training utterances. Our idea is to incorporate the class label into GAN which involves a minimax optimization problem for adversarial learning. We build a generator and a discriminator where the class conditional i-vectors are produced by the generator such that the discriminator can not distinguish them as the fake samples. In particular, multiple learning objectives are optimized to build a specialized deep model for model regularization in speaker recognition. In addition to the minimax optimization of adversarial loss, the posterior probabilities of class labels given real and fake samples are maximized. The cosine similarity between real and fake i-vectors is also minimized to preserve the quality of the generated i-vector. The loss functions for data reconstruction and Gaussian regularization in PLDA model are minimized. The experiments illustrate the merit of multi-objective learning for deep adversarial augmentation for speaker recognition.

Original languageEnglish
Pages342-348
Number of pages7
DOIs
StatePublished - 2018
Event2018 Speaker and Language Recognition Workshop, ODYSSEY 2018 - Les Sables d'Olonne, France
Duration: 26 Jun 201829 Jun 2018

Conference

Conference2018 Speaker and Language Recognition Workshop, ODYSSEY 2018
Country/TerritoryFrance
CityLes Sables d'Olonne
Period26/06/1829/06/18

Fingerprint

Dive into the research topics of 'Adversarial Learning and Augmentation for Speaker Recognition'. Together they form a unique fingerprint.

Cite this