Anchor-Based Detection for Natural Language Localization in Ego-Centric Videos

Bei Liu, Sipeng Zheng, Jianlong Fu, Wen Huang Cheng

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The Natural Language Localization (NLL) task aims to localize a sentence in a video with starting and ending timestamps. It requires a comprehensive understanding of both language and videos. We have seen a lot of work conducted for third-person view videos, while the task on ego-centric videos is still under-explored, which is critical for the understanding of increasing ego-centric videos and further facilitating embodied AI tasks. Directly adapting existing methods of NLL to ego-centric video datasets is challenging due to two reasons. Firstly, there is a temporal duration gap between different datasets. Secondly, queries in ego-centric videos usually require a better understanding of more complex and long-term temporal orders. For the above reason, we propose an anchor-based detection model for NLL in ego-centric videos.

Original languageEnglish
Title of host publication2023 IEEE International Conference on Consumer Electronics, ICCE 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665491303
DOIs
StatePublished - 2023
Event2023 IEEE International Conference on Consumer Electronics, ICCE 2023 - Las Vegas, United States
Duration: 6 Jan 20238 Jan 2023

Publication series

NameDigest of Technical Papers - IEEE International Conference on Consumer Electronics
Volume2023-January
ISSN (Print)0747-668X

Conference

Conference2023 IEEE International Conference on Consumer Electronics, ICCE 2023
Country/TerritoryUnited States
CityLas Vegas
Period6/01/238/01/23

Keywords

  • cross-modality
  • ego-centric video
  • Embodied AI
  • video understanding

Fingerprint

Dive into the research topics of 'Anchor-Based Detection for Natural Language Localization in Ego-Centric Videos'. Together they form a unique fingerprint.

Cite this