TY - GEN
T1 - Developing an Interview Recording System with Speaker Recognition and Emotion Classification
AU - Hsieh, Wei Yi
AU - Tsai, Hsun Ching
AU - Chen, Lu An
AU - İk, Tsì Uí
N1 - Publisher Copyright:
Copyright 2023 KICS.
PY - 2023
Y1 - 2023
N2 - Dialogue records play a crucial role in meetings and counseling, and the speaker’s information is also necessary. Furthermore, if the speaker’s emotions can be captured, the record can more faithfully reflect the speaker’s reactions at that time. This study aims to develop an interview recording system that integrates speaker recognition and speech emotion classification, providing speech transcription, speaker information, and sentence-level emotion recognition. The prototype system consists of three subsystems: the Meeting Recording System serves as the primary user interface for meeting recording and data statistics; the Data Labeling System is used to correct meeting records and as a tool for data collection; the Voiceprint Management System provides functions for speaker registration and voiceprint management. To train the multiemotion classification model, we re-labeled 11 hours of audio from the NNIME corpus. After performance evaluation, the F1-score of multi-label emotion classification can reach 0.5115, and speaker recognition accuracy can reach 96.39%, while the text records are generated using Microsoft Speech-To-Text API.
AB - Dialogue records play a crucial role in meetings and counseling, and the speaker’s information is also necessary. Furthermore, if the speaker’s emotions can be captured, the record can more faithfully reflect the speaker’s reactions at that time. This study aims to develop an interview recording system that integrates speaker recognition and speech emotion classification, providing speech transcription, speaker information, and sentence-level emotion recognition. The prototype system consists of three subsystems: the Meeting Recording System serves as the primary user interface for meeting recording and data statistics; the Data Labeling System is used to correct meeting records and as a tool for data collection; the Voiceprint Management System provides functions for speaker registration and voiceprint management. To train the multiemotion classification model, we re-labeled 11 hours of audio from the NNIME corpus. After performance evaluation, the F1-score of multi-label emotion classification can reach 0.5115, and speaker recognition accuracy can reach 96.39%, while the text records are generated using Microsoft Speech-To-Text API.
KW - Meeting recording system
KW - Natural language processing
KW - Speaker recognition
KW - Speech emotion classification
UR - http://www.scopus.com/inward/record.url?scp=85174847289&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85174847289
T3 - APNOMS 2023 - 24th Asia-Pacific Network Operations and Management Symposium: Intelligent Management for Enabling the Digital Transformation
SP - 267
EP - 270
BT - APNOMS 2023 - 24th Asia-Pacific Network Operations and Management Symposium
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 24th Asia-Pacific Network Operations and Management Symposium, APNOMS 2023
Y2 - 6 September 2023 through 8 September 2023
ER -