Extending music based on emotion and tonality via generative adversarial network

Bo Wei Tseng, Yih Liang Shen, Tai Shih Chi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

We propose a generative model for music extension in this paper. The model is composed of two classifiers, one for music emotion and one for music tonality, and a generative adversarial network (GAN). Therefore, it can generate symbolic music not only based on low level spectral and temporal characteristics, but also on high level emotion and tonality attributes of previously observed music pieces. The generative model works in a universal latent space constructed by the variational autoencoder (VAE) for representing music pieces. We conduct subjective listening tests and derive objective measures for performance evaluation. Experimental results show that the proposed model produces much smoother and more authentic music pieces than the baseline model in terms of all subjective and objective measures.

Original languageEnglish
Title of host publication2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages86-90
Number of pages5
ISBN (Electronic)9781728176055
DOIs
StatePublished - Jun 2021
Event2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021 - Virtual, Toronto, Canada
Duration: 6 Jun 202111 Jun 2021

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021
Country/TerritoryCanada
CityVirtual, Toronto
Period6/06/2111/06/21

Keywords

  • Generative adversarial network
  • Music emotion
  • Music generation
  • Tonality
  • Variational autoencoder

Fingerprint

Dive into the research topics of 'Extending music based on emotion and tonality via generative adversarial network'. Together they form a unique fingerprint.

Cite this