An Attention-based Neural Network on Multiple Speaker Diarization

Shao Wen Cheng, Kai Jyun Hung, Hsie Chia Chang, Yen Chin Liao

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Speaker diarization is a task to label audio or video recordings with classes that correspond to speaker identity for each point in time, which can be used in a multi-speaker conversation environment, such as a meeting or interview. Moreover, speaker diarization can be used to improve the performance of auto speech recognition. This paper presents an end-to-end diarization model based on an attention mechanism with data augmentation, several data pre-processing, and post-processing. In the CALLHOME data set, the case of two speakers reached a 9.12% diarization error rate. We combine the speaker diarization model, and auto speech recognition model and implement the transcript conversion system on an edge device. By using proposed speaker diarization as preprocessing to segment recording according to different speakers, then get the transcript of each segmented utterance by ASR model to fulfill the transcript conversion on the edge device. Experiment shows that our model also performs well in the scenario with two people on edge devices with both accuracy and inference time.

Original languageEnglish
Title of host publicationProceeding - IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages431-434
Number of pages4
ISBN (Electronic)9781665409964
DOIs
StatePublished - 2022
Event4th IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2022 - Incheon, Korea, Republic of
Duration: 13 Jun 202215 Jun 2022

Publication series

NameProceeding - IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2022

Conference

Conference4th IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2022
Country/TerritoryKorea, Republic of
CityIncheon
Period13/06/2215/06/22

Keywords

  • Attention Mechanism
  • End-to-end Diarization Model
  • Speaker Diarization
  • Transcript Conversion

Fingerprint

Dive into the research topics of 'An Attention-based Neural Network on Multiple Speaker Diarization'. Together they form a unique fingerprint.

Cite this