TY - GEN
T1 - Re-Attention Is All You Need
T2 - 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2021
AU - Chang, Hsiang Chun
AU - Chen, Hung Jen
AU - Shen, Yu Chia
AU - Shuai, Hong-Han
AU - Cheng, Wen-Huang
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Scene text detection plays an important role on vision-based robot navigation to many potential landmarks such as nameplates, information signs, floor button in the elevators. Recently, scene text detection with segmentation-based methods has been receiving more and more attention. The segmentation results can be used to efficiently predict scene text of various shapes, such as irregular text in most scene text images. However, two kinds of texts remain unsolved: 1) tiny and 2) blurry instances. Moreover, the annotations for tiny/blurry texts are usually ignored during training, while tiny/blurry texts can still offer visual auxiliaries for robots to understand the world. Therefore, in this paper, we propose a new approach to effectively detect both clear and blurry texts. Specifically, we propose a re-attention module without increasing the learnable parameters, which first predicts the region of texts as the candidate region and leverages the same network to detect the candidate region again for reducing the required memory. Moreover, to avoid the errors from the first detection propagating to the re-attended area, we propose a new fusion module that learns to integrate the results of the re-attended regions and the first prediction. Experimental results manifest that the proposed method outperforms state-of-the-art methods on four challenging datasets.
AB - Scene text detection plays an important role on vision-based robot navigation to many potential landmarks such as nameplates, information signs, floor button in the elevators. Recently, scene text detection with segmentation-based methods has been receiving more and more attention. The segmentation results can be used to efficiently predict scene text of various shapes, such as irregular text in most scene text images. However, two kinds of texts remain unsolved: 1) tiny and 2) blurry instances. Moreover, the annotations for tiny/blurry texts are usually ignored during training, while tiny/blurry texts can still offer visual auxiliaries for robots to understand the world. Therefore, in this paper, we propose a new approach to effectively detect both clear and blurry texts. Specifically, we propose a re-attention module without increasing the learnable parameters, which first predicts the region of texts as the candidate region and leverages the same network to detect the candidate region again for reducing the required memory. Moreover, to avoid the errors from the first detection propagating to the re-attended area, we propose a new fusion module that learns to integrate the results of the re-attended regions and the first prediction. Experimental results manifest that the proposed method outperforms state-of-the-art methods on four challenging datasets.
UR - http://www.scopus.com/inward/record.url?scp=85124373433&partnerID=8YFLogxK
U2 - 10.1109/IROS51168.2021.9636510
DO - 10.1109/IROS51168.2021.9636510
M3 - Conference contribution
AN - SCOPUS:85124373433
T3 - IEEE International Conference on Intelligent Robots and Systems
SP - 452
EP - 459
BT - IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2021
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 27 September 2021 through 1 October 2021
ER -