Unsupervised Domain Adaptation for Spatio-Temporal Action Localization

Nakul Agarwal, Yi-Ting Chen, Behzad Dariush, Ming Hsuan Yang

研究成果: Paper同行評審

3 引文 斯高帕斯(Scopus)


Spatio-temporal action localization is an important problem in computer vision that involves detecting where and when activities occur, and therefore requires modeling of both spatial and temporal features. This problem is typically formulated in the context of supervised learning, where the learned classifiers operate on the premise that both training and test data are sampled from the same underlying distribution. However, this assumption does not hold when there is a significant domain shift, leading to poor generalization performance on the test data. To address this, we focus on the hard and novel task of generalizing training models to test samples without access to any labels from the latter for spatio-temporal action localization by proposing an end-to-end unsupervised domain adaptation algorithm. We extend the state-of-the-art object detection framework to localize and classify actions. In order to minimize the domain shift, three domain adaptation modules at image level (temporal and spatial) and instance level (temporal) are designed and integrated. We design a new experimental setup and evaluate the proposed method and different adaptation modules on the UCF-Sports, UCF-101 and JHMDB benchmark datasets. We show that significant performance gain can be achieved when spatial and temporal features are adapted separately, or jointly for the most effective results.
出版狀態Published - 9月 2020
事件The 31st British Machine Vision Virtual Conference -
持續時間: 7 9月 202010 9月 2020


ConferenceThe 31st British Machine Vision Virtual Conference


深入研究「Unsupervised Domain Adaptation for Spatio-Temporal Action Localization」主題。共同形成了獨特的指紋。