The acousticvisual emotion guassians model for automatic generation of music video

Ju Chiang Wang*, Yi Hsuan Yang, I. Hong Jhuo, Yen-Yu Lin, Hsin Min Wang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

26 Scopus citations

Abstract

This paper presents a novel content-based system that utilizes the perceived emotion of multimedia content as a bridge to connect music and video. Specifically, we propose a novel machine learning framework, called Acousticvisual Emotion Guassians (AVEG), to jointly learn the tripartite relationship among music, video, and emotion from an emotion-annotated corpus of music videos. For a music piece (or a video sequence), the AVEG model is applied to predict its emotion distribution in a stochastic emotion space from the corresponding low-level acoustic (resp. visual) features. Finally, music and video are matched by measuring the similarity between the two corresponding emotion distributions, based on a distance measure such as KL divergence.

Original languageEnglish
Title of host publicationMM 2012 - Proceedings of the 20th ACM International Conference on Multimedia
Pages1379-1380
Number of pages2
DOIs
StatePublished - 26 Dec 2012
Event20th ACM International Conference on Multimedia, MM 2012 - Nara, Japan
Duration: 29 Oct 20122 Nov 2012

Publication series

NameMM 2012 - Proceedings of the 20th ACM International Conference on Multimedia

Conference

Conference20th ACM International Conference on Multimedia, MM 2012
Country/TerritoryJapan
CityNara
Period29/10/122/11/12

Keywords

  • cross-modal media retrieval
  • emotion recognition

Fingerprint

Dive into the research topics of 'The acousticvisual emotion guassians model for automatic generation of music video'. Together they form a unique fingerprint.

Cite this