TY - JOUR
T1 - Within-class feature normalization for robust speech recognition
AU - Liao, Yuan Fu
AU - Hsu, Chi Hui
AU - Yang, Chi Min
AU - Lin, Jeng Shien
AU - Chang, Sen Chia
PY - 2008
Y1 - 2008
N2 - In this paper, a within-class feature normalization (WCFN) framework operating in transformed segment-level (instead of frame-level) super-vector space is proposed for robust speech recognition. In this framework, each segment hypothesis in a lattice is represented by a high dimensional super-vector and projected to a class-dependent lower-dimensional eigen-subspace to remove unwanted variability due to environment noise and speaker (different values of SNR, gender, types of noise and so on). The normalized super-vectors are verified by a bank of class detectors to further rescore the lattice. Experimental results on Aurora 2 multi-condition training task showed that the proposed WCFN approach achieved 7.45% average word error rate (WER). WCFN not only outperformed the multi-condition training baseline (Multi-Con., 13.72%) but also the blind ETSI advanced DSR front-end (ETSI-Adv., 8.65%), the histogram equalization (HEQ, 8.66%) and the non-blind reference model weighting (RMW, 7.29%) approaches.
AB - In this paper, a within-class feature normalization (WCFN) framework operating in transformed segment-level (instead of frame-level) super-vector space is proposed for robust speech recognition. In this framework, each segment hypothesis in a lattice is represented by a high dimensional super-vector and projected to a class-dependent lower-dimensional eigen-subspace to remove unwanted variability due to environment noise and speaker (different values of SNR, gender, types of noise and so on). The normalized super-vectors are verified by a bank of class detectors to further rescore the lattice. Experimental results on Aurora 2 multi-condition training task showed that the proposed WCFN approach achieved 7.45% average word error rate (WER). WCFN not only outperformed the multi-condition training baseline (Multi-Con., 13.72%) but also the blind ETSI advanced DSR front-end (ETSI-Adv., 8.65%), the histogram equalization (HEQ, 8.66%) and the non-blind reference model weighting (RMW, 7.29%) approaches.
KW - Robust speech recognition
KW - WCFN
UR - http://www.scopus.com/inward/record.url?scp=84867217115&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:84867217115
SN - 2308-457X
SP - 1020
EP - 1023
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association
Y2 - 22 September 2008 through 26 September 2008
ER -