TY - GEN
T1 - Segmental contribution to predicting speech intelligibility in noisy conditions
AU - Wang, Lei
AU - Chen, Fei
AU - Lai, Ying Hui
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/8/16
Y1 - 2016/8/16
N2 - It is necessary to identify speech segments carrying important information for speech intelligibility, particularly in noise. Earlier work based on a relative rootmean-square (RMS) level based segmentation suggested that middle-level (ranging from the overall RMS level to 10 dB below) segments contained more vowel-consonant boundaries wherein the spectral change was often most prominent, and perhaps most robust, in the presence of noise, and hence yielded improved performance of objective intelligibility modeling. Since the three levels (i.e., high-, middle- and low-levels) were defined empirically when proposed, the present work assessed how the boundaries of RMS-level based segmentation affected the performance of speech intelligibility prediction. When evaluated with speech recognition scores obtained with normal-hearing listeners and with a total of 72 noisedistorted and noise-suppressed conditions, it was shown that choosing 0 and - 10 dB to split middle-level led to maximized correlation in predicting the intelligibility of speech in noise.
AB - It is necessary to identify speech segments carrying important information for speech intelligibility, particularly in noise. Earlier work based on a relative rootmean-square (RMS) level based segmentation suggested that middle-level (ranging from the overall RMS level to 10 dB below) segments contained more vowel-consonant boundaries wherein the spectral change was often most prominent, and perhaps most robust, in the presence of noise, and hence yielded improved performance of objective intelligibility modeling. Since the three levels (i.e., high-, middle- and low-levels) were defined empirically when proposed, the present work assessed how the boundaries of RMS-level based segmentation affected the performance of speech intelligibility prediction. When evaluated with speech recognition scores obtained with normal-hearing listeners and with a total of 72 noisedistorted and noise-suppressed conditions, it was shown that choosing 0 and - 10 dB to split middle-level led to maximized correlation in predicting the intelligibility of speech in noise.
KW - Intelligibility prediction
KW - Relative RMS-level based segmentation
KW - Speech intelligibility
UR - http://www.scopus.com/inward/record.url?scp=84987657348&partnerID=8YFLogxK
U2 - 10.1109/BigMM.2016.88
DO - 10.1109/BigMM.2016.88
M3 - Conference contribution
AN - SCOPUS:84987657348
T3 - Proceedings - 2016 IEEE 2nd International Conference on Multimedia Big Data, BigMM 2016
SP - 476
EP - 480
BT - Proceedings - 2016 IEEE 2nd International Conference on Multimedia Big Data, BigMM 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2nd IEEE International Conference on Multimedia Big Data, BigMM 2016
Y2 - 20 April 2016 through 22 April 2016
ER -