TY - JOUR
T1 - Eigen-MLLR environment/speaker compensation for robust speech recognition
AU - Liao, Yuan Fu
AU - Fang, Hung Hsiang
AU - Hsu, Chi Hui
PY - 2008
Y1 - 2008
N2 - In this paper an eigen-maximum likelihood linear regression (Eigen-MLLR) method is proposed to utilize a set of a priori noisy environment/speaker knowledge to online compensate the characteristics of unknown test environment/speaker. This idea is straightforward but is motivated from our recent findings that both the characteristics of different kinds of noisy environments and speakers could be simultaneously well organized in a PCA-constructed Eigen-MLLR subspace. Especially, the first three dimensions of the constructed Eigen-MLLR subspace are highly related to the SNR value, gender and type of noise. The proposed Eigen-MLLR was evaluated on Aurora 2 multi-condition training task. Experimental results showed that average word error rate (WER) of 6.14% was achieved. Moreover, Eigen-MLLR not only outperformed the multi-condition training baseline (Multi-Con., 13.72%) but also the blind ETSI advanced DSR front-end (ETSI-Adv., 8.65%), the histogram equalization (HEQ, 8.66%) and the non-blind reference model weighting (RMW, 7.29%) approaches.
AB - In this paper an eigen-maximum likelihood linear regression (Eigen-MLLR) method is proposed to utilize a set of a priori noisy environment/speaker knowledge to online compensate the characteristics of unknown test environment/speaker. This idea is straightforward but is motivated from our recent findings that both the characteristics of different kinds of noisy environments and speakers could be simultaneously well organized in a PCA-constructed Eigen-MLLR subspace. Especially, the first three dimensions of the constructed Eigen-MLLR subspace are highly related to the SNR value, gender and type of noise. The proposed Eigen-MLLR was evaluated on Aurora 2 multi-condition training task. Experimental results showed that average word error rate (WER) of 6.14% was achieved. Moreover, Eigen-MLLR not only outperformed the multi-condition training baseline (Multi-Con., 13.72%) but also the blind ETSI advanced DSR front-end (ETSI-Adv., 8.65%), the histogram equalization (HEQ, 8.66%) and the non-blind reference model weighting (RMW, 7.29%) approaches.
KW - Eigen-MLLR
KW - Robust speech recognition
UR - http://www.scopus.com/inward/record.url?scp=84867213516&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:84867213516
SN - 2308-457X
SP - 1249
EP - 1252
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association
Y2 - 22 September 2008 through 26 September 2008
ER -