Enhanced Pedestrian Trajectory Prediction via the Cross-Modal Feature Fusion Transformer

Rashid Ali*, Hsu Feng Hsiao

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

We address the challenge of predicting pedestrian trajectories in videos, a task inherently complex due to the diverse and intricate nature of human motion and interactions within their environment. The accurate anticipation of trajectories necessitates a holistic comprehension of the temporal evolution of past events in videos. Regrettably, existing methods often neglect the fusion of critical features, such as human behavior, motion, and interaction, thereby limiting their efficacy in tackling these challenges. To overcome these limitations, we propose the Cross-modal Feature Fusion Transformer, a novel approach for pedestrian trajectory prediction. Our model seamlessly integrates multimodal features, including human behavior, position, speed, and interaction with surroundings, to effectively encapsulate the temporal progression of observed frames. It consists of transformer-based cross-modal fusion encoder and decoder modules, adeptly melding the interactions between the multimodal features through a multi-head co-Attentional mechanism. This enables the precise prediction of future trajectories. Additionally, we incorporate auxiliary self-supervised future prediction losses to learn the temporal evolution of past and future multimodal features. We evaluate our approach on ETH/UCY and ActEV/VIRAT datasets and demonstrate its superior performance compared to state-of-The-Art methods.

Original languageEnglish
Title of host publication2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350359855
DOIs
StatePublished - 2023
Event2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023 - Jeju, Korea, Republic of
Duration: 4 Dec 20237 Dec 2023

Publication series

Name2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023

Conference

Conference2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023
Country/TerritoryKorea, Republic of
CityJeju
Period4/12/237/12/23

Keywords

  • Cross-Attention
  • Self-supervised learning
  • Trajectory prediction
  • Transformer

Fingerprint

Dive into the research topics of 'Enhanced Pedestrian Trajectory Prediction via the Cross-Modal Feature Fusion Transformer'. Together they form a unique fingerprint.

Cite this