TY - JOUR
T1 - A DeNoising FPN with Transformer R-CNN for Tiny Object Detection
AU - Liu, Hou I.
AU - Tseng, Yu Wen
AU - Chang, Kai Cheng
AU - Wang, Pin Jyun
AU - Shuai, Hong Han
AU - Cheng, Wen-Huang
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Despite notable advancements in the field of computer vision (CV), the precise detection of tiny objects continues to pose a significant challenge, largely due to the minuscule pixel representation allocated to these objects in imagery data. This challenge resonates profoundly in the domain of geoscience and remote sensing, where high-fidelity detection of tiny objects can facilitate a myriad of applications ranging from urban planning to environmental monitoring. In this article, we propose a new framework, namely, DeNoising feature pyramid network (FPN) with Trans R-CNN (DNTR), to improve the performance of tiny object detection. DNTR consists of an easy plug-in design, DeNoising FPN (DN-FPN), and an effective Transformer-based detector, Trans region-based convolutional neural network (R-CNN). Specifically, feature fusion in the FPN is important for detecting multiscale objects. However, noisy features may be produced during the fusion process since there is no regularization between the features of different scales. Therefore, we introduce a DN-FPN module that utilizes contrastive learning to suppress noise in each level's features in the top-down path of FPN. Second, based on the two-stage framework, we replace the obsolete R-CNN detector with a novel Trans R-CNN detector to focus on the representation of tiny objects with self-attention. The experimental results manifest that our DNTR outperforms the baselines by at least 17.4% in terms of AP_vt on the AI-TOD dataset and 9.6% in terms of average precision (AP) on the VisDrone dataset, respectively. Our code will be available at https://github.com/hoiliu-0801/DNTR.
AB - Despite notable advancements in the field of computer vision (CV), the precise detection of tiny objects continues to pose a significant challenge, largely due to the minuscule pixel representation allocated to these objects in imagery data. This challenge resonates profoundly in the domain of geoscience and remote sensing, where high-fidelity detection of tiny objects can facilitate a myriad of applications ranging from urban planning to environmental monitoring. In this article, we propose a new framework, namely, DeNoising feature pyramid network (FPN) with Trans R-CNN (DNTR), to improve the performance of tiny object detection. DNTR consists of an easy plug-in design, DeNoising FPN (DN-FPN), and an effective Transformer-based detector, Trans region-based convolutional neural network (R-CNN). Specifically, feature fusion in the FPN is important for detecting multiscale objects. However, noisy features may be produced during the fusion process since there is no regularization between the features of different scales. Therefore, we introduce a DN-FPN module that utilizes contrastive learning to suppress noise in each level's features in the top-down path of FPN. Second, based on the two-stage framework, we replace the obsolete R-CNN detector with a novel Trans R-CNN detector to focus on the representation of tiny objects with self-attention. The experimental results manifest that our DNTR outperforms the baselines by at least 17.4% in terms of AP_vt on the AI-TOD dataset and 9.6% in terms of average precision (AP) on the VisDrone dataset, respectively. Our code will be available at https://github.com/hoiliu-0801/DNTR.
KW - Aerial image
KW - contrastive learning
KW - noise reduction
KW - tiny object detection
KW - transformer-based detector
UR - http://www.scopus.com/inward/record.url?scp=85192192326&partnerID=8YFLogxK
U2 - 10.1109/TGRS.2024.3396489
DO - 10.1109/TGRS.2024.3396489
M3 - Article
AN - SCOPUS:85192192326
SN - 0196-2892
VL - 62
SP - 1
EP - 15
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
M1 - 4704415
ER -