TY - JOUR
T1 - A multi-embedding neural model for incident video retrieval
AU - Chiang, Ting Hui
AU - Tseng, Yi Chun
AU - Tseng, Yu Chee
N1 - Publisher Copyright:
© 2022 Elsevier Ltd
PY - 2022/10
Y1 - 2022/10
N2 - Many internet search engines have been developed, however, the retrieval of video clips remains a challenge. This paper considers the retrieval of incident videos, which may contain more spatial and temporal semantics. We propose an encoder-decoder ConvLSTM model that explores multiple embeddings of a video to facilitate comparison of similarity between a pair of videos. The model is able to encode a video into an embedding that integrates both its spatial information and temporal semantics. Multiple video embeddings are then generated from coarse- and fine-grained features of a video to capture high- and low-level meanings. Subsequently, a learning-based comparative model is proposed to compare the similarity of two videos based on their embeddings. Extensive evaluations are presented and show that our model outperforms state-of-the-art methods for several video retrieval tasks on the FIVR-200K, CC_WEB_VIDEO, and EVVE datasets.
AB - Many internet search engines have been developed, however, the retrieval of video clips remains a challenge. This paper considers the retrieval of incident videos, which may contain more spatial and temporal semantics. We propose an encoder-decoder ConvLSTM model that explores multiple embeddings of a video to facilitate comparison of similarity between a pair of videos. The model is able to encode a video into an embedding that integrates both its spatial information and temporal semantics. Multiple video embeddings are then generated from coarse- and fine-grained features of a video to capture high- and low-level meanings. Subsequently, a learning-based comparative model is proposed to compare the similarity of two videos based on their embeddings. Extensive evaluations are presented and show that our model outperforms state-of-the-art methods for several video retrieval tasks on the FIVR-200K, CC_WEB_VIDEO, and EVVE datasets.
KW - Artificial intelligence
KW - Computer vision
KW - Deep metric learning
KW - Incident video retrieval
UR - http://www.scopus.com/inward/record.url?scp=85131077789&partnerID=8YFLogxK
U2 - 10.1016/j.patcog.2022.108807
DO - 10.1016/j.patcog.2022.108807
M3 - Article
AN - SCOPUS:85131077789
SN - 0031-3203
VL - 130
JO - Pattern Recognition
JF - Pattern Recognition
M1 - 108807
ER -