Multi-scale Motion-Aware Module for Video Action Recognition

Huai Wei Peng*, Yu Chee Tseng

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Due to the lengthy computing time for optical flow, recent works have proposed to use the correlation operation as an alternative approach to extracting motion features. Although using correlation operations shows significant improvement with negligible FLOPs, it introduces much more latency per FLOP than convolution operations and increases noticeable latency as a larger searching patch is applied. Nonetheless, shrinking the searching patch in correlation operation is doomed to degrade its performance owing to the inability to capture larger displacements. In this paper, we propose an effective and low-latency Multi-Scale Motion-Aware (MSMA) module. It uses smaller searching patches at different scales for efficiently extracting motion features from large displacements. It can be installed into and generalizes well on different CNN backbones. When installed into TSM ResNet-50, the MSMA module introduces ≈ 17.6% more latency on NVIDIA Tesla V100 GPU, yet, it achieves state-of-the-art performance on Something-Something V1 & V2 and Diving-48.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2022 Workshops, Proceedings
EditorsLeonid Karlinsky, Tomer Michaeli, Ko Nishino
PublisherSpringer Science and Business Media Deutschland GmbH
Pages589-606
Number of pages18
ISBN (Print)9783031250743
DOIs
StatePublished - 2023
Event17th European Conference on Computer Vision, ECCV 2022 - Tel Aviv, Israel
Duration: 23 Oct 202227 Oct 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13806 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th European Conference on Computer Vision, ECCV 2022
Country/TerritoryIsrael
CityTel Aviv
Period23/10/2227/10/22

Keywords

  • Correlation operations
  • Latency-performance trade-off
  • Motion features extracting
  • Video classification

Fingerprint

Dive into the research topics of 'Multi-scale Motion-Aware Module for Video Action Recognition'. Together they form a unique fingerprint.

Cite this