TY - JOUR
T1 - Exploiting glottal and prosodic information for robust speaker verification
AU - Liao, Yuan Fu
AU - Zeng, Zhi Ren
AU - Chen, Zi He
AU - Juang, Yau Tarng
N1 - Publisher Copyright:
© 2006 Proceedings of the International Conference on Speech Prosody.
PY - 2006
Y1 - 2006
N2 - In this paper, three different levels of speaker cues including the glottal, prosodic and spectral information are integrated together to build a robust speaker verification system. The major purpose is to resist the distortion of channels and handsets. Especially, the dynamic behavior of normalized amplitude quotient (NAQ) and prosodic feature contours are modeled using Gaussian of mixture models (GMMs) and two latent prosody analyses (LPAs)-based approaches, respectively. The proposed methods are evaluated on the standard one speaker detection task of the 2001 NIST Speaker Recognition Evaluation Corpus where only one 2-minute training and 30-second trial speech (in average) are available. Experimental results have shown that the proposed approach could improve the equal error rates (EERs) of maximum a priori-adapted (MAP)-GMMs and GMMs+T-norm approaches from 12.4% and 9.5% to 10.3% and 8.3% and finally to 7.8%, respectively.
AB - In this paper, three different levels of speaker cues including the glottal, prosodic and spectral information are integrated together to build a robust speaker verification system. The major purpose is to resist the distortion of channels and handsets. Especially, the dynamic behavior of normalized amplitude quotient (NAQ) and prosodic feature contours are modeled using Gaussian of mixture models (GMMs) and two latent prosody analyses (LPAs)-based approaches, respectively. The proposed methods are evaluated on the standard one speaker detection task of the 2001 NIST Speaker Recognition Evaluation Corpus where only one 2-minute training and 30-second trial speech (in average) are available. Experimental results have shown that the proposed approach could improve the equal error rates (EERs) of maximum a priori-adapted (MAP)-GMMs and GMMs+T-norm approaches from 12.4% and 9.5% to 10.3% and 8.3% and finally to 7.8%, respectively.
UR - http://www.scopus.com/inward/record.url?scp=85089852148&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85089852148
SN - 2333-2042
JO - Proceedings of the International Conference on Speech Prosody
JF - Proceedings of the International Conference on Speech Prosody
T2 - 3rd International Conference on Speech Prosody, SP 2006
Y2 - 2 May 2006 through 5 May 2006
ER -