TY - JOUR
T1 - SMILEtrack
T2 - 38th AAAI Conference on Artificial Intelligence, AAAI 2024
AU - Wang, Yu Hsiang
AU - Hsieh, Jun Wei
AU - Chen, Ping Yang
AU - Chang, Ming Ching
AU - So, Hung Hin
AU - Li, Xin
N1 - Publisher Copyright:
Copyright © 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2024/3/25
Y1 - 2024/3/25
N2 - Despite recent progress in Multiple Object Tracking (MOT), several obstacles such as occlusions, similar objects, and complex scenes remain an open challenge. Meanwhile, a systematic study of the cost-performance tradeoff for the popular tracking-by-detection paradigm is still lacking. This paper introduces SMILEtrack, an innovative object tracker that effectively addresses these challenges by integrating an efficient object detector with a Siamese network-based Similarity Learning Module (SLM). The technical contributions of SMILETrack are twofold. First, we propose an SLM that calculates the appearance similarity between two objects, overcoming the limitations of feature descriptors in Separate Detection and Embedding (SDE) models. The SLM incorporates a Patch Self-Attention (PSA) block inspired by the vision Transformer, which generates reliable features for accurate similarity matching. Second, we develop a Similarity Matching Cascade (SMC) module with a novel GATE function for robust object matching across consecutive video frames, further enhancing MOT performance. Together, these innovations help SMILETrack achieve an improved trade-off between the cost (e.g., running speed) and performance (e.g., tracking accuracy) over several existing state-of-the-art benchmarks, including the popular BYTETrack method. SMILETrack outperforms BYTETrack by 0.4-0.8 MOTA and 2.1-2.2 HOTA points on MOT17 and MOT20 datasets. Code is available at https://github.com/pingyang1117/SMILEtrack Official.
AB - Despite recent progress in Multiple Object Tracking (MOT), several obstacles such as occlusions, similar objects, and complex scenes remain an open challenge. Meanwhile, a systematic study of the cost-performance tradeoff for the popular tracking-by-detection paradigm is still lacking. This paper introduces SMILEtrack, an innovative object tracker that effectively addresses these challenges by integrating an efficient object detector with a Siamese network-based Similarity Learning Module (SLM). The technical contributions of SMILETrack are twofold. First, we propose an SLM that calculates the appearance similarity between two objects, overcoming the limitations of feature descriptors in Separate Detection and Embedding (SDE) models. The SLM incorporates a Patch Self-Attention (PSA) block inspired by the vision Transformer, which generates reliable features for accurate similarity matching. Second, we develop a Similarity Matching Cascade (SMC) module with a novel GATE function for robust object matching across consecutive video frames, further enhancing MOT performance. Together, these innovations help SMILETrack achieve an improved trade-off between the cost (e.g., running speed) and performance (e.g., tracking accuracy) over several existing state-of-the-art benchmarks, including the popular BYTETrack method. SMILETrack outperforms BYTETrack by 0.4-0.8 MOTA and 2.1-2.2 HOTA points on MOT17 and MOT20 datasets. Code is available at https://github.com/pingyang1117/SMILEtrack Official.
UR - http://www.scopus.com/inward/record.url?scp=85189525076&partnerID=8YFLogxK
U2 - 10.1609/aaai.v38i6.28386
DO - 10.1609/aaai.v38i6.28386
M3 - Conference article
AN - SCOPUS:85189525076
SN - 2159-5399
VL - 38
SP - 5740
EP - 5748
JO - Proceedings of the AAAI Conference on Artificial Intelligence
JF - Proceedings of the AAAI Conference on Artificial Intelligence
IS - 6
Y2 - 20 February 2024 through 27 February 2024
ER -