Semantic Fusion Augmentation and Semantic Boundary Detection: A Novel Approach to Multi-Target Video Moment Retrieval

Cheng Huang*, Yi Lun Wu, Hong Han Shuai, Ching Chun Huang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Given an untrimmed video and a natural language query, video moment retrieval (VMR) aims to retrieve video moments described by the query. However, most existing VMR methods assume a one-to-one mapping between the input query and the target video moment (single-target VMR), disregarding the possibility that a video may contain multiple target moments that match the query description (multi-target VMR). Previous methods tackle multi-target VMR by incorporating false negative moments with the original target moment for multi-target training. However, existing methods cannot properly work when no false negative moments exist in the video, or when the identified false negative moments are noisy but are still being utilized as pseudo-labels. In this paper, we propose to tackle multi-target VMR by Semantic Fusion Augmentation and Semantic Boundary Detection (SFABD). Specifically, we use feature-level augmentation to generate augmented target moments, along with an intra-video contrastive loss to ensure feature consistency. Meanwhile, we perform semantic boundary detection to adaptively remove all false negatives from the negative set of contrastive loss to avoid semantic confusion. Extensive experiments conducted on Charades-STA, ActivityNet Captions, and QVHighlights show that our method achieves state-of-the-art performance on multi-target metrics and single-target metrics. The source code is available at https://github.com/basiclab/SFABD.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6769-6778
Number of pages10
ISBN (Electronic)9798350318920
DOIs
StatePublished - 3 Jan 2024
Event2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024 - Waikoloa, United States
Duration: 4 Jan 20248 Jan 2024

Publication series

NameProceedings - 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024

Conference

Conference2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024
Country/TerritoryUnited States
CityWaikoloa
Period4/01/248/01/24

Keywords

  • Algorithms
  • Algorithms
  • Video recognition and understanding
  • Vision + language and/or other modalities

Fingerprint

Dive into the research topics of 'Semantic Fusion Augmentation and Semantic Boundary Detection: A Novel Approach to Multi-Target Video Moment Retrieval'. Together they form a unique fingerprint.

Cite this