TY - GEN
T1 - Sparse coding based music genre classification using spectro-temporal modulations
AU - Hsu, Kai Chun
AU - Lin, Chih Shan
AU - Chi, Tai-Shih
N1 - Publisher Copyright:
© Kai-Chun Hsu, Chih-Shan Lin, Tai-Shih Chi.
PY - 2016/8
Y1 - 2016/8
N2 - Spectro-temporal modulations (STMs) of the sound convey timbre and rhythm information so that they are intuitively useful for automatic music genre classification. The STMs are usually extracted from a time-frequency representation of the acoustic signal. In this paper, we investigate the efficacy of two kinds of STM features, the Gabor features and the rate-scale (RS) features, selectively extracted from various time-frequency representations, including the short-time Fourier transform (STFT) spectrogram, the constant-Q transform (CQT) spectrogram and the auditory (AUD) spectrogram, in recognizing the music genre. In our system, the dictionary learning and sparse coding techniques are adopted for training the support vector machine (SVM) classifier. Both spectral-type features and modulation-type features are used to test the system. Experiment results show that the RS features extracted from the log. magnituded CQT spectrogram produce the highest recognition rate in classifying the music genre.
AB - Spectro-temporal modulations (STMs) of the sound convey timbre and rhythm information so that they are intuitively useful for automatic music genre classification. The STMs are usually extracted from a time-frequency representation of the acoustic signal. In this paper, we investigate the efficacy of two kinds of STM features, the Gabor features and the rate-scale (RS) features, selectively extracted from various time-frequency representations, including the short-time Fourier transform (STFT) spectrogram, the constant-Q transform (CQT) spectrogram and the auditory (AUD) spectrogram, in recognizing the music genre. In our system, the dictionary learning and sparse coding techniques are adopted for training the support vector machine (SVM) classifier. Both spectral-type features and modulation-type features are used to test the system. Experiment results show that the RS features extracted from the log. magnituded CQT spectrogram produce the highest recognition rate in classifying the music genre.
UR - http://www.scopus.com/inward/record.url?scp=85048391041&partnerID=8YFLogxK
U2 - 10.5281/zenodo.1418099
DO - 10.5281/zenodo.1418099
M3 - Conference contribution
AN - SCOPUS:85048391041
T3 - Proceedings of the 17th International Society for Music Information Retrieval Conference, ISMIR 2016
SP - 744
EP - 750
BT - Proceedings of the 17th International Society for Music Information Retrieval Conference, ISMIR 2016
A2 - Devaney, Johanna
A2 - Turnbull, Douglas
A2 - Mandel, Michael I.
A2 - Tzanetakis, George
PB - International Society for Music Information Retrieval
T2 - 17th International Society for Music Information Retrieval Conference, ISMIR 2016
Y2 - 7 August 2016 through 11 August 2016
ER -