An Attention-based Neural Network on Multiple Speaker Diarization

Shao Wen Cheng, Kai Jyun Hung, Hsie Chia Chang, Yen Chin Liao

研究成果: Conference contribution同行評審

1 引文 斯高帕斯(Scopus)

摘要

Speaker diarization is a task to label audio or video recordings with classes that correspond to speaker identity for each point in time, which can be used in a multi-speaker conversation environment, such as a meeting or interview. Moreover, speaker diarization can be used to improve the performance of auto speech recognition. This paper presents an end-to-end diarization model based on an attention mechanism with data augmentation, several data pre-processing, and post-processing. In the CALLHOME data set, the case of two speakers reached a 9.12% diarization error rate. We combine the speaker diarization model, and auto speech recognition model and implement the transcript conversion system on an edge device. By using proposed speaker diarization as preprocessing to segment recording according to different speakers, then get the transcript of each segmented utterance by ASR model to fulfill the transcript conversion on the edge device. Experiment shows that our model also performs well in the scenario with two people on edge devices with both accuracy and inference time.

原文English
主出版物標題Proceeding - IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2022
發行者Institute of Electrical and Electronics Engineers Inc.
頁面431-434
頁數4
ISBN(電子)9781665409964
DOIs
出版狀態Published - 2022
事件4th IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2022 - Incheon, 韓國
持續時間: 13 6月 202215 6月 2022

出版系列

名字Proceeding - IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2022

Conference

Conference4th IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2022
國家/地區韓國
城市Incheon
期間13/06/2215/06/22

指紋

深入研究「An Attention-based Neural Network on Multiple Speaker Diarization」主題。共同形成了獨特的指紋。

引用此