Learning from Music to Visual Storytelling of Shots: A Deep Interactive Learning Mechanism

Jen Chun Lin, Wen Li Wei, Yen Yu Lin, Tyng Luh Liu, Hong Yuan Mark Liao

研究成果: Conference contribution同行評審

1 引文 斯高帕斯(Scopus)

摘要

Learning from music to visual storytelling of shots is an interesting and emerging task. It produces a coherent visual story in the form of a shot type sequence, which not only expands the storytelling potential for a song but also facilitates automatic concert video mashup process and storyboard generation. In this study, we present a deep interactive learning (DIL) mechanism for building a compact yet accurate sequence-to-sequence model to accomplish the task. Different from the one-way transfer between a pre-trained teacher network (or ensemble network) and a student network in knowledge distillation (KD), the proposed method enables collaborative learning between an ensemble teacher network and a student network. Namely, the student network also teaches. Specifically, our method first learns a teacher network that is composed of several assistant networks to generate a shot type sequence and produce the soft target (shot types) distribution accordingly through KD. It then constructs the student network that learns from both the ground truth label (hard target) and the soft target distribution to alleviate the difficulty of optimization and improve generalization capability. As the student network gradually advances, it turns to feed back knowledge to the assistant networks, thereby improving the teacher network in each iteration. Owing to such interactive designs, the DIL mechanism bridges the gap between the teacher and student networks and produces more superior capability for both networks. Objective and subjective experimental results demonstrate that both the teacher and student networks can generate more attractive shot sequences from music, thereby enhancing the viewing and listening experience.

原文English
主出版物標題MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia
發行者Association for Computing Machinery, Inc
頁面102-110
頁數9
ISBN(電子)9781450379885
DOIs
出版狀態Published - 12 10月 2020
事件28th ACM International Conference on Multimedia, MM 2020 - Virtual, Online, 美國
持續時間: 12 10月 202016 10月 2020

出版系列

名字MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia

Conference

Conference28th ACM International Conference on Multimedia, MM 2020
國家/地區美國
城市Virtual, Online
期間12/10/2016/10/20

指紋

深入研究「Learning from Music to Visual Storytelling of Shots: A Deep Interactive Learning Mechanism」主題。共同形成了獨特的指紋。

引用此