TrajPrompt: Aligning Color Trajectory with Vision-Language Representations

Li Wu Tsao*, Hao Tang Tsui, Yu Rou Tuan, Pei Chi Chen, Kuan Lin Wang, Jhih Ciang Wu, Hong Han Shuai, Wen Huang Cheng

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Cross-modal learning shows promising potential to overcome the limitations of single-modality tasks. However, without proper design for representation alignment between different data sources, the external modality cannot fully exhibit its value. For example, recent trajectory prediction approaches incorporate the Bird’s-Eye-View (BEV) scene as an additional source but do not significantly improve performance compared to single-source strategies, indicating that the BEV scene and trajectory representations are not effectively combined. To overcome this problem, we propose TrajPrompt, a prompt-based approach that seamlessly incorporates trajectory representation into the vision-language framework, i.e. CLIP, for the BEV scene understanding and future forecasting. We discover that CLIP can attend to the local area of the BEV scene by utilizing our innovative design of text prompts and colored lines. Comprehensive results demonstrate TrajPrompt’s effectiveness via outperforming the state-of-the-art trajectory predictors by a significant margin (over 35% improvement for ADE and FDE metrics on SDD and DroneCrowd dataset), using fewer learnable parameters than the previous trajectory modeling approaches with scene information included. Project page: https://trajprompt.github.io/.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2024 - 18th European Conference, Proceedings
EditorsAleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol
PublisherSpringer Science and Business Media Deutschland GmbH
Pages275-292
Number of pages18
ISBN (Print)9783031729393
DOIs
StatePublished - 2025
Event18th European Conference on Computer Vision, ECCV 2024 - Milan, Italy
Duration: 29 Sep 20244 Oct 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume15099 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th European Conference on Computer Vision, ECCV 2024
Country/TerritoryItaly
CityMilan
Period29/09/244/10/24

Keywords

  • Bird’s-Eye-View Scene
  • Cross-Modal Learning
  • Efficient Prompt Tuning
  • Trajectory Prediction
  • Vision-Language Understanding

Fingerprint

Dive into the research topics of 'TrajPrompt: Aligning Color Trajectory with Vision-Language Representations'. Together they form a unique fingerprint.

Cite this