Weakly-supervised video re-localization with multiscale attention model

Yung Han Huang, Kuang Jui Hsu, Shyh Kang Jeng, Yen Yu Lin

研究成果: Conference contribution同行評審

8 引文 斯高帕斯(Scopus)

摘要

Video re-localization aims to localize a sub-sequence, called target segment, in an untrimmed reference video that is similar to a given query video. In this work, we propose an attention-based model to accomplish this task in a weakly supervised setting. Namely, we derive our CNN-based model without using the annotated locations of the target segments in reference videos. Our model contains three modules. First, it employs a pre-trained C3D network for feature extraction. Second, we design an attention mechanism to extract multiscale temporal features, which are then used to estimate the similarity between the query video and a reference video.Third, a localization layer detects where the target segment is in the reference video by determining whether each frame in the reference video is consistent with the query video. The resultant CNN model is derived based on the proposed coattention loss which discriminatively separates the target segment from the reference video. This loss maximizes the similarity between the query video and the target segment while minimizing the similarity between the target segment and the rest of the reference video. Our model can be modified to fully supervised re-localization. Our method is evaluated on a public dataset and achieves the state-of-the-art performance under both weakly supervised and fully supervised settings.

原文English
主出版物標題AAAI 2020 - 34th AAAI Conference on Artificial Intelligence
發行者AAAI press
頁面11077-11084
頁數8
ISBN(電子)9781577358350
DOIs
出版狀態Published - 3 4月 2020
事件34th AAAI Conference on Artificial Intelligence, AAAI 2020 - New York, United States
持續時間: 7 2月 202012 2月 2020

出版系列

名字AAAI 2020 - 34th AAAI Conference on Artificial Intelligence

Conference

Conference34th AAAI Conference on Artificial Intelligence, AAAI 2020
國家/地區United States
城市New York
期間7/02/2012/02/20

指紋

深入研究「Weakly-supervised video re-localization with multiscale attention model」主題。共同形成了獨特的指紋。

引用此