A Hybrid Convolutional and Transformer Network for Salient Object Detection

Bei Sin Li, Hsu Feng Hsiao

研究成果: Conference contribution同行評審

摘要

We present a novel hybrid architecture that seamlessly merges transformers and convolutional neural networks to enhance the performance of RGB-D salient object detection. Transformer-based models have recently demonstrated their potential in this field, owing to their unique ability to encode long-range information via the self-Attention mechanism. This mechanism adeptly mirrors human visual perception by capturing long-distance dependencies and selectively focusing on the most relevant sections of the input image. In contrast, convolutional neural networks, with their robust generalization and trainability, have proven to be invaluable for a wide array of image processing tasks. By fusing the strengths of these two models, our proposed hybrid architecture outperforms the effectiveness of using either transformers or convolutional neural networks in isolation. Our architecture employs an encoder-decoder framework. Within this structure, the hybrid model functions as the feature encoder, while the decoder integrates a convolutional neural network with deep layer aggregation to adeptly merge features of varying resolutions derived from the transformer-based encoder. This strategic design choice exploits the computational modeling prowess of convolutional neural networks in tasks such as saliency prediction, while also benefiting from the long-range dependency modeling offered by the hybrid model. We also use a Siamese architecture with shared parameters in the encoder to concurrently learn salient features from RGB and depth data. By harnessing the complementary strengths of both models, the proposed hybrid architecture has demonstrated superior performance.

原文English
主出版物標題2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023
發行者Institute of Electrical and Electronics Engineers Inc.
ISBN(電子)9798350359855
DOIs
出版狀態Published - 2023
事件2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023 - Jeju, 韓國
持續時間: 4 12月 20237 12月 2023

出版系列

名字2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023

Conference

Conference2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023
國家/地區韓國
城市Jeju
期間4/12/237/12/23

指紋

深入研究「A Hybrid Convolutional and Transformer Network for Salient Object Detection」主題。共同形成了獨特的指紋。

引用此