TY - GEN
T1 - RESIDUAL GRAPH ATTENTION NETWORK AND EXPRESSION-RESPECT DATA AUGMENTATION AIDED VISUAL GROUNDING
AU - Wang, Jia
AU - Wu, Hung Yi
AU - Chen, Jun Cheng
AU - Shuai, Hong Han
AU - Cheng, Wen Huang
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Visual grounding aims to localize a target object in an image based on a given text description. Due to the innate complexity of language, it is still a challenging problem to perform reasoning of complex expressions and to infer the underlying relationship between the expression and the object in an image. To address these issues, we propose a residual graph attention network for visual grounding. The proposed approach first builds an expression-guided relation graph and then performs multi-step reasoning followed by matching the target object. It allows performing better visual grounding with complex expressions by using deeper layers than other graph network approaches. Moreover, to increase the diversity of training data, we perform an expression-respect data augmentation based on copy-paste operations to pairs of source and target images. The proposed approach achieves better performance with extensive experiments than other state-of-the-art graph network-based approaches and demonstrates its effectiveness.
AB - Visual grounding aims to localize a target object in an image based on a given text description. Due to the innate complexity of language, it is still a challenging problem to perform reasoning of complex expressions and to infer the underlying relationship between the expression and the object in an image. To address these issues, we propose a residual graph attention network for visual grounding. The proposed approach first builds an expression-guided relation graph and then performs multi-step reasoning followed by matching the target object. It allows performing better visual grounding with complex expressions by using deeper layers than other graph network approaches. Moreover, to increase the diversity of training data, we perform an expression-respect data augmentation based on copy-paste operations to pairs of source and target images. The proposed approach achieves better performance with extensive experiments than other state-of-the-art graph network-based approaches and demonstrates its effectiveness.
KW - Expression-respect data augmentation
KW - Residual graph attention network
KW - Visual grounding
UR - http://www.scopus.com/inward/record.url?scp=85146723566&partnerID=8YFLogxK
U2 - 10.1109/ICIP46576.2022.9897564
DO - 10.1109/ICIP46576.2022.9897564
M3 - Conference contribution
AN - SCOPUS:85146723566
T3 - Proceedings - International Conference on Image Processing, ICIP
SP - 326
EP - 330
BT - 2022 IEEE International Conference on Image Processing, ICIP 2022 - Proceedings
PB - IEEE Computer Society
T2 - 29th IEEE International Conference on Image Processing, ICIP 2022
Y2 - 16 October 2022 through 19 October 2022
ER -