Spectro-temporal modulation energy based mask for robust speaker identification

Tai-Shih Chi, Ting Han Lin, Chung Chien Hsu

研究成果: Article同行評審

14 引文 斯高帕斯(Scopus)

摘要

Spectro-temporal modulations of speech encode speech structures and speaker characteristics. An algorithm which distinguishes speech from non-speech based on spectro-temporal modulation energies is proposed and evaluated in robust text-independent closed-set speaker identification simulations using the TIMIT and GRID corpora. Simulation results show the proposed method produces much higher speaker identification rates in all signal-to-noise ratio (SNR) conditions than the baseline system using mel-frequency cepstral coefficients. In addition, the proposed method also outperforms the system, which uses auditory-based nonnegative tensor cepstral coefficients [Q. Wu and L. Zhang, "Auditory sparse representation for robust speaker recognition based on tensor structure," EURASIP J. Audio, Speech, Music Process. 2008, 578612 (2008)], in low SNR (≤ 10 dB) conditions.

原文English
頁(從 - 到)EL368-EL374
期刊Journal of the Acoustical Society of America
131
發行號5
DOIs
出版狀態Published - 5月 2012

指紋

深入研究「Spectro-temporal modulation energy based mask for robust speaker identification」主題。共同形成了獨特的指紋。

引用此