Video summarization with frame index vision transformer

Tzu Chun Hsu, Yi Sheng Liao, Chun Rong Huang

研究成果: Conference contribution同行評審

5 引文 斯高帕斯(Scopus)

摘要

In this paper, we propose a novel frame index vision transformer for video summarization. Given training frames, we linearly project the content of the frames to obtain frame embedding. By incorporating the frame embedding with the index embedding and class embedding, the proposed frame index vision transformer can be efficiently and effectively applied to learn the importance of the input frames. As shown in the experimental results, the proposed method outperforms the state-of-the-art deep learning methods including recurrent neural network (RNN) and convolutional neural network (CNN) based methods in both of the SumMe and TVSum datasets. In addition, our method can achieve real-time computational efficiency during testing.

原文English
主出版物標題Proceedings of MVA 2021 - 17th International Conference on Machine Vision Applications
發行者Institute of Electrical and Electronics Engineers Inc.
ISBN(電子)9784901122207
DOIs
出版狀態Published - 25 7月 2021
事件17th International Conference on Machine Vision Applications, MVA 2021 - Aichi, 日本
持續時間: 25 7月 202127 7月 2021

出版系列

名字Proceedings of MVA 2021 - 17th International Conference on Machine Vision Applications

Conference

Conference17th International Conference on Machine Vision Applications, MVA 2021
國家/地區日本
城市Aichi
期間25/07/2127/07/21

指紋

深入研究「Video summarization with frame index vision transformer」主題。共同形成了獨特的指紋。

引用此