TY - GEN
T1 - Text-to-Speech with Model Compression on Edge Devices
AU - Koc, Wai Wan
AU - Chang, Yung Ting
AU - Yu, Jian Yu
AU - Ik, Tsi Ui
N1 - Publisher Copyright:
© 2021 IEICE.
PY - 2021/9/8
Y1 - 2021/9/8
N2 - The application of voice services has become more common in daily life, including traffic navigation, voice assistants, audio books and so on. However, considering the cost and variability, it is difficult to fully utilize real voice recordings in different scenarios. In practice, speech synthesis technology is usually used to mimic human voices; On the other hand, with the development of computer equipment, the computing power of edge devices has also gradually improved, which enables light deep-learning network inference. Currently, many deep learning technologies have been ported to edge devices to create different applications, such as face recognition, speech recognition, and photo retouching. Therefore, if the speech synthesis network is ported to edge devices, with the advent of the fifth generation mobile communication generation (5G), it would be able to provide more innovative basis for voice services. In this research, the speech synthesis network Tacotron2 [1] + CBHG [2] will be ported to edge device and aims to optimize the model inference time and amount of parameters. The model optimization would be based on the compression of deep learning network, quantization, structured pruning and low-rank matrix approximation techniques to allow the speech synthesis network working effectively on edge devices. On the other hand, we get over the difference in library support between TensorFlow 1.5 and TensorFlow Lite. After the compression of the model, the inference speed of the Tacotron2 speech synthesis network on edge device is increased by 1.91 times, while the model size is reduced by 86% respectively.
AB - The application of voice services has become more common in daily life, including traffic navigation, voice assistants, audio books and so on. However, considering the cost and variability, it is difficult to fully utilize real voice recordings in different scenarios. In practice, speech synthesis technology is usually used to mimic human voices; On the other hand, with the development of computer equipment, the computing power of edge devices has also gradually improved, which enables light deep-learning network inference. Currently, many deep learning technologies have been ported to edge devices to create different applications, such as face recognition, speech recognition, and photo retouching. Therefore, if the speech synthesis network is ported to edge devices, with the advent of the fifth generation mobile communication generation (5G), it would be able to provide more innovative basis for voice services. In this research, the speech synthesis network Tacotron2 [1] + CBHG [2] will be ported to edge device and aims to optimize the model inference time and amount of parameters. The model optimization would be based on the compression of deep learning network, quantization, structured pruning and low-rank matrix approximation techniques to allow the speech synthesis network working effectively on edge devices. On the other hand, we get over the difference in library support between TensorFlow 1.5 and TensorFlow Lite. After the compression of the model, the inference speed of the Tacotron2 speech synthesis network on edge device is increased by 1.91 times, while the model size is reduced by 86% respectively.
KW - CBHG
KW - Edge Devices
KW - Model Compression
KW - Quantization
KW - Structured Pruning
KW - Tacotron2
KW - Text-to-Speech
UR - http://www.scopus.com/inward/record.url?scp=85118153788&partnerID=8YFLogxK
U2 - 10.23919/APNOMS52696.2021.9562651
DO - 10.23919/APNOMS52696.2021.9562651
M3 - Conference contribution
AN - SCOPUS:85118153788
T3 - 2021 22nd Asia-Pacific Network Operations and Management Symposium, APNOMS 2021
SP - 114
EP - 119
BT - 2021 22nd Asia-Pacific Network Operations and Management Symposium, APNOMS 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 22nd Asia-Pacific Network Operations and Management Symposium, APNOMS 2021
Y2 - 8 September 2021 through 10 September 2021
ER -