TY - JOUR
T1 - Intelligent voice smoother for silence-suppressed voice over internet
AU - Tien, Po-Lung
AU - Yuang, Maria C.
PY - 1999/1
Y1 - 1999/1
N2 - When transporting voice data with silence suppression over the Internet, the problem of jitter introduced from the network often renders the speech unintelligible. It is thus indispensable to offer intramedia synchronization to remove jitter while retaining minimal playout delay (PD). In this paper, we propose a neural network (NN)-based intravoice synchronization mechanism, called the intelligent voice smoother (IVoS). IVoS is composed of three components: 1) the smoother buffer; 2) the NN traffic predictor; and 3) the constant bit rate (CBR) enforcer. Newly arriving frames, assumed to follow a generic Markov modulated Bernoulli process (MMBP), are queued in the smoother buffer. The NN traffic predictor employs an online-trained back propagation NN (BPNN) to predict three traffic characteristics of every newly encountered talkspurt period. Based on the predicted characteristics, the CBR enforcer derives an adaptive buffering delay (ABD) by means of a near-optimal simple closed-form formula. It then imposes the delay on the playout of the first frame in the talkspurt period. The CBR enforcer in turn regulates CBR-based departures for the remaining frames of the talkspurt, aiming at assuring minimal mean and variance of distortion of talkspurts (DOT) and mean PD. Simulation results reveal that, compared to three other playout approaches, IVoS achieves superior playout, yielding negligible DOT and PD, irrespective of traffic variation.
AB - When transporting voice data with silence suppression over the Internet, the problem of jitter introduced from the network often renders the speech unintelligible. It is thus indispensable to offer intramedia synchronization to remove jitter while retaining minimal playout delay (PD). In this paper, we propose a neural network (NN)-based intravoice synchronization mechanism, called the intelligent voice smoother (IVoS). IVoS is composed of three components: 1) the smoother buffer; 2) the NN traffic predictor; and 3) the constant bit rate (CBR) enforcer. Newly arriving frames, assumed to follow a generic Markov modulated Bernoulli process (MMBP), are queued in the smoother buffer. The NN traffic predictor employs an online-trained back propagation NN (BPNN) to predict three traffic characteristics of every newly encountered talkspurt period. Based on the predicted characteristics, the CBR enforcer derives an adaptive buffering delay (ABD) by means of a near-optimal simple closed-form formula. It then imposes the delay on the playout of the first frame in the talkspurt period. The CBR enforcer in turn regulates CBR-based departures for the remaining frames of the talkspurt, aiming at assuring minimal mean and variance of distortion of talkspurts (DOT) and mean PD. Simulation results reveal that, compared to three other playout approaches, IVoS achieves superior playout, yielding negligible DOT and PD, irrespective of traffic variation.
KW - Back propagation neural network (BPNN)
KW - Best effort service
KW - Constant bit rate (CBR)
KW - Internet
KW - Intramedia synchronization
KW - Jitter
KW - Markov modulated Bernoulli process (MMBP)
KW - Multimedia communications
KW - Silence suppression
UR - http://www.scopus.com/inward/record.url?scp=0008102341&partnerID=8YFLogxK
U2 - 10.1109/49.743694
DO - 10.1109/49.743694
M3 - Article
AN - SCOPUS:0008102341
SN - 0733-8716
VL - 17
SP - 29
EP - 41
JO - IEEE Journal on Selected Areas in Communications
JF - IEEE Journal on Selected Areas in Communications
IS - 1
ER -