TY - JOUR
T1 - Robust voice activity detection algorithm based on feature of frequency modulation of harmonics and its DSP implementation
AU - Hsu, Chung Chien
AU - Cheong, Kah Meng
AU - Chi, Tai-Shih
AU - Tsao, Yu
N1 - Publisher Copyright:
Copyright © 2015 The Institute of Electronics, Information and Communication Engineers.
PY - 2015/10/1
Y1 - 2015/10/1
N2 - This paper proposes a voice activity detection (VAD) algorithm based on an energy related feature of the frequency modulation of harmonics. A multi-resolution spectro-temporal analysis framework, which was developed to extract texture features of the audio signal from its Fourier spectrogram, is used to extract frequency modulation features of the speech signal. The proposed algorithm labels the voice active segments of the speech signal by comparing the energy related feature of the frequency modulation of harmonics with a threshold. Then, the proposed VAD is implemented on one of Texas Instruments (TI) digital signal processor (DSP) platforms for real-time operation. Simulations conducted on the DSP platform demonstrate the proposed VAD performs significantly better than three standard VADs, ITU-T G.729B, ETSI AMR1 and AMR2, in non-stationary noise in terms of the receiver operating characteristic (ROC) curves and the recognition rates from a practical distributed speech recognition (DSR) system.
AB - This paper proposes a voice activity detection (VAD) algorithm based on an energy related feature of the frequency modulation of harmonics. A multi-resolution spectro-temporal analysis framework, which was developed to extract texture features of the audio signal from its Fourier spectrogram, is used to extract frequency modulation features of the speech signal. The proposed algorithm labels the voice active segments of the speech signal by comparing the energy related feature of the frequency modulation of harmonics with a threshold. Then, the proposed VAD is implemented on one of Texas Instruments (TI) digital signal processor (DSP) platforms for real-time operation. Simulations conducted on the DSP platform demonstrate the proposed VAD performs significantly better than three standard VADs, ITU-T G.729B, ETSI AMR1 and AMR2, in non-stationary noise in terms of the receiver operating characteristic (ROC) curves and the recognition rates from a practical distributed speech recognition (DSR) system.
KW - Digital signal processor
KW - Frequency modulation
KW - Spectrotemporal analysis
KW - Voice activity detection
UR - http://www.scopus.com/inward/record.url?scp=84942935499&partnerID=8YFLogxK
U2 - 10.1587/transinf.2015EDP7138
DO - 10.1587/transinf.2015EDP7138
M3 - Article
AN - SCOPUS:84942935499
SN - 0916-8532
VL - E98D
SP - 1808
EP - 1817
JO - IEICE Transactions on Information and Systems
JF - IEICE Transactions on Information and Systems
IS - 10
ER -