TY - GEN
T1 - A Unified SpatioTemporal Network with Structural Pruning for Video Action Recognition
AU - Chen, Yang Jie
AU - Ali, Rashid
AU - Hsiao, Hsu Feng
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Video action recognition poses significant challenges in capturing and integrating the complex spatiotemporal patterns and motion dynamics necessary for robust understanding. Despite recent advancements, existing deep learning approaches often struggle to efficiently model these interactions over extended temporal ranges. To address this, we propose the Unified SpatioTemporal Network (USTN), a novel framework that fuses segment-level spatiotemporal features with long-range temporal difference information. By strategically employing sparse frame sampling, USTN constructs a rich, coarse-grained representation encapsulating both spatial structure and temporal evolution. Furthermore, we introduce a structural pruning technique to identify and remove redundant parameters, mitigating overfitting and enhancing computational efficiency without compromising performance. Extensive evaluations on the challenging UCF101 and HMDB51 benchmarks, using USTN instantiated with ResNet backbones, demonstrate the superiority of our approach.
AB - Video action recognition poses significant challenges in capturing and integrating the complex spatiotemporal patterns and motion dynamics necessary for robust understanding. Despite recent advancements, existing deep learning approaches often struggle to efficiently model these interactions over extended temporal ranges. To address this, we propose the Unified SpatioTemporal Network (USTN), a novel framework that fuses segment-level spatiotemporal features with long-range temporal difference information. By strategically employing sparse frame sampling, USTN constructs a rich, coarse-grained representation encapsulating both spatial structure and temporal evolution. Furthermore, we introduce a structural pruning technique to identify and remove redundant parameters, mitigating overfitting and enhancing computational efficiency without compromising performance. Extensive evaluations on the challenging UCF101 and HMDB51 benchmarks, using USTN instantiated with ResNet backbones, demonstrate the superiority of our approach.
UR - https://www.scopus.com/pages/publications/105010577344
U2 - 10.1109/ISCAS56072.2025.11043896
DO - 10.1109/ISCAS56072.2025.11043896
M3 - Conference contribution
AN - SCOPUS:105010577344
T3 - Proceedings - IEEE International Symposium on Circuits and Systems
BT - ISCAS 2025 - IEEE International Symposium on Circuits and Systems, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 IEEE International Symposium on Circuits and Systems, ISCAS 2025
Y2 - 25 May 2025 through 28 May 2025
ER -