RESIDUAL GRAPH ATTENTION NETWORK AND EXPRESSION-RESPECT DATA AUGMENTATION AIDED VISUAL GROUNDING

Jia Wang, Hung Yi Wu, Jun Cheng Chen, Hong Han Shuai, Wen Huang Cheng*

*此作品的通信作者

研究成果: Conference contribution同行評審

2 引文 斯高帕斯(Scopus)

摘要

Visual grounding aims to localize a target object in an image based on a given text description. Due to the innate complexity of language, it is still a challenging problem to perform reasoning of complex expressions and to infer the underlying relationship between the expression and the object in an image. To address these issues, we propose a residual graph attention network for visual grounding. The proposed approach first builds an expression-guided relation graph and then performs multi-step reasoning followed by matching the target object. It allows performing better visual grounding with complex expressions by using deeper layers than other graph network approaches. Moreover, to increase the diversity of training data, we perform an expression-respect data augmentation based on copy-paste operations to pairs of source and target images. The proposed approach achieves better performance with extensive experiments than other state-of-the-art graph network-based approaches and demonstrates its effectiveness.

原文English
主出版物標題2022 IEEE International Conference on Image Processing, ICIP 2022 - Proceedings
發行者IEEE Computer Society
頁面326-330
頁數5
ISBN(電子)9781665496209
DOIs
出版狀態Published - 2022
事件29th IEEE International Conference on Image Processing, ICIP 2022 - Bordeaux, France
持續時間: 16 10月 202219 10月 2022

出版系列

名字Proceedings - International Conference on Image Processing, ICIP
ISSN(列印)1522-4880

Conference

Conference29th IEEE International Conference on Image Processing, ICIP 2022
國家/地區France
城市Bordeaux
期間16/10/2219/10/22

指紋

深入研究「RESIDUAL GRAPH ATTENTION NETWORK AND EXPRESSION-RESPECT DATA AUGMENTATION AIDED VISUAL GROUNDING」主題。共同形成了獨特的指紋。

引用此