MOSA: Music Motion With Semantic Annotation Dataset for Cross-Modal Music Processing

Yu Fen Huang*, Nikki Moran, Simon Coleman, Jon Kelly, Shun Hwa Wei, Po Yin Chen, Yun Hsin Huang, Tsung Ping Chen, Yu Chia Kuo, Yu Chi Wei, Chih Hsuan Li, Da Yu Huang, Hsuan Kai Kao, Ting Wei Lin, Li Su

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

In cross-modal music processing, translation between visual, auditory, and semantic content opens up new possibilities as well as challenges. The construction of such a transformative scheme depends upon a benchmark corpus with a comprehensive data infrastructure. In particular, the assembly of a large-scale cross-modal dataset presents major challenges. In this paper, we present the MOSA (Music mOtion with Semantic Annotation) dataset, which contains high quality 3-D motion capture data, aligned audio recordings, and note-by-note semantic annotations of pitch, beat, phrase, dynamic, articulation, and harmony for 742 professional music performances by 23 professional musicians, comprising more than 30 hours and 570 K notes of data. To our knowledge, this is the largest cross-modal music dataset with note-level annotations to date. To demonstrate the usage of the MOSA dataset, we present several innovative cross-modal music information retrieval (MIR) and musical content generation tasks, including the detection of beats, downbeats, phrases, and expressive contents from audio, video and motion data, and the generation of musicians’ body motion from given music audio.

Original languageEnglish
Pages (from-to)4157-4170
Number of pages14
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume32
DOIs
StatePublished - 2024

Keywords

  • Music information retrieval
  • artificial intelligence
  • cross-modal
  • motion capture
  • music semantics

Fingerprint

Dive into the research topics of 'MOSA: Music Motion With Semantic Annotation Dataset for Cross-Modal Music Processing'. Together they form a unique fingerprint.

Cite this