TY - JOUR
T1 - Modeling speech intelligibility with recovered envelope from temporal fine structure stimulus
AU - Chen, Fei
AU - Tsao, Yu
AU - Lai, Ying Hui
N1 - Publisher Copyright:
© 2016 Elsevier Ltd
PY - 2016/7
Y1 - 2016/7
N2 - Temporal envelope and fine structure are two prominent acoustic cues for speech perception. Most existing speech-transmission-index-based metrics make use of the temporal envelope information and discard the temporal fine structure (TFS) cue to predict speech intelligibility. Recent studies have shown that the TFS stimulus synthesized with multiband TFS waveforms contains rich intelligibility information, which is reflected as the recovered envelope from the TFS stimulus. The present study first assessed the performance of using the recovered envelope from the synthesized TFS stimulus to predict the intelligibility of noise-distorted and noise-suppressed speech. The TFS stimulus was synthesized and fed as an input into the conventional normalized covariance measure (NCM) module. The results showed that the recovered envelope from the TFS stimulus predicted the intelligibility as well as the original envelope extracted from the wideband speech signal did. In addition, an additive intelligibility model was designed to combine the envelope from wideband speech and the recovered envelope from the TFS stimulus to predict speech intelligibility. The prediction power was significantly improved when these two envelope waveforms were integrated. The present study suggests that the recovered envelope from the TFS stimulus may be alternative acoustic information for modeling speech intelligibility and improving the prediction power of the conventional NCM-based intelligibility index.
AB - Temporal envelope and fine structure are two prominent acoustic cues for speech perception. Most existing speech-transmission-index-based metrics make use of the temporal envelope information and discard the temporal fine structure (TFS) cue to predict speech intelligibility. Recent studies have shown that the TFS stimulus synthesized with multiband TFS waveforms contains rich intelligibility information, which is reflected as the recovered envelope from the TFS stimulus. The present study first assessed the performance of using the recovered envelope from the synthesized TFS stimulus to predict the intelligibility of noise-distorted and noise-suppressed speech. The TFS stimulus was synthesized and fed as an input into the conventional normalized covariance measure (NCM) module. The results showed that the recovered envelope from the TFS stimulus predicted the intelligibility as well as the original envelope extracted from the wideband speech signal did. In addition, an additive intelligibility model was designed to combine the envelope from wideband speech and the recovered envelope from the TFS stimulus to predict speech intelligibility. The prediction power was significantly improved when these two envelope waveforms were integrated. The present study suggests that the recovered envelope from the TFS stimulus may be alternative acoustic information for modeling speech intelligibility and improving the prediction power of the conventional NCM-based intelligibility index.
KW - Normalized covariance measure
KW - Recovered envelope
KW - Speech intelligibility
KW - Temporal fine structure
UR - http://www.scopus.com/inward/record.url?scp=84959454759&partnerID=8YFLogxK
U2 - 10.1016/j.specom.2016.01.006
DO - 10.1016/j.specom.2016.01.006
M3 - Article
AN - SCOPUS:84959454759
VL - 81
SP - 120
EP - 128
JO - Speech Communication
JF - Speech Communication
SN - 0167-6393
ER -