TY - GEN
T1 - DNN Audio Classification Based on Extracted Spectral Attributes
AU - Lo, Pei Chen
AU - Liu, Chuan Yi
AU - Chou, Tsung Hsien
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Recent advances in multimedia systems provide remarkable audio-visual experiences to various fields including entertainment, education, communication, industrial design, etc. To facilitate the audio-visual experience, audio quality enhancement becomes important. However, methods and techniques for improving audio quality highly depend on such audio attributes like human voices, music of different genres, or audio of various programs. This study is devoted to the development of an effective method for real-time audio classification based on deep learning scheme. Three classes of interest include classical music, non-classical music and news. Subband-power distribution (SPD) is a one-dimensional feature based on the audio power in frequency domain, which effectively reflects the spectral attributes of various audio content and allows us to implement DNN (deep neural network) audio classifier in real time. This study develops different DNN models according to various input designs, original SPD of different frequency resolutions and SPD pre-processed by principal component analysis (PCA). Overall accuracy Acc and prediction accuracy of each class using confusion matrix (CFM) will be evaluated to compare the performance. According to our results, the DNN audio classifier implemented with the input SPD pre-processed by PCA not only achieves better performance but remarkably reduces the memory capacity and computational time.
AB - Recent advances in multimedia systems provide remarkable audio-visual experiences to various fields including entertainment, education, communication, industrial design, etc. To facilitate the audio-visual experience, audio quality enhancement becomes important. However, methods and techniques for improving audio quality highly depend on such audio attributes like human voices, music of different genres, or audio of various programs. This study is devoted to the development of an effective method for real-time audio classification based on deep learning scheme. Three classes of interest include classical music, non-classical music and news. Subband-power distribution (SPD) is a one-dimensional feature based on the audio power in frequency domain, which effectively reflects the spectral attributes of various audio content and allows us to implement DNN (deep neural network) audio classifier in real time. This study develops different DNN models according to various input designs, original SPD of different frequency resolutions and SPD pre-processed by principal component analysis (PCA). Overall accuracy Acc and prediction accuracy of each class using confusion matrix (CFM) will be evaluated to compare the performance. According to our results, the DNN audio classifier implemented with the input SPD pre-processed by PCA not only achieves better performance but remarkably reduces the memory capacity and computational time.
KW - Audio classification
KW - Deep learning
KW - Deep neural network (DNN)
KW - Principal component analysis (PCA)
KW - Real-time process
KW - Subband-power distribution (SPD)
UR - http://www.scopus.com/inward/record.url?scp=85171788101&partnerID=8YFLogxK
U2 - 10.1109/ICSPS58776.2022.00050
DO - 10.1109/ICSPS58776.2022.00050
M3 - Conference contribution
AN - SCOPUS:85171788101
T3 - Proceedings - 2022 14th International Conference on Signal Processing Systems, ICSPS 2022
SP - 259
EP - 262
BT - Proceedings - 2022 14th International Conference on Signal Processing Systems, ICSPS 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 14th International Conference on Signal Processing Systems, ICSPS 2022
Y2 - 18 November 2022 through 20 November 2022
ER -