A Hybrid Convolutional and Transformer Network for Salient Object Detection

Bei Sin Li, Hsu Feng Hsiao

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We present a novel hybrid architecture that seamlessly merges transformers and convolutional neural networks to enhance the performance of RGB-D salient object detection. Transformer-based models have recently demonstrated their potential in this field, owing to their unique ability to encode long-range information via the self-Attention mechanism. This mechanism adeptly mirrors human visual perception by capturing long-distance dependencies and selectively focusing on the most relevant sections of the input image. In contrast, convolutional neural networks, with their robust generalization and trainability, have proven to be invaluable for a wide array of image processing tasks. By fusing the strengths of these two models, our proposed hybrid architecture outperforms the effectiveness of using either transformers or convolutional neural networks in isolation. Our architecture employs an encoder-decoder framework. Within this structure, the hybrid model functions as the feature encoder, while the decoder integrates a convolutional neural network with deep layer aggregation to adeptly merge features of varying resolutions derived from the transformer-based encoder. This strategic design choice exploits the computational modeling prowess of convolutional neural networks in tasks such as saliency prediction, while also benefiting from the long-range dependency modeling offered by the hybrid model. We also use a Siamese architecture with shared parameters in the encoder to concurrently learn salient features from RGB and depth data. By harnessing the complementary strengths of both models, the proposed hybrid architecture has demonstrated superior performance.

Original languageEnglish
Title of host publication2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350359855
DOIs
StatePublished - 2023
Event2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023 - Jeju, Korea, Republic of
Duration: 4 Dec 20237 Dec 2023

Publication series

Name2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023

Conference

Conference2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023
Country/TerritoryKorea, Republic of
CityJeju
Period4/12/237/12/23

Keywords

  • RGB-D salient object detection
  • Siamese network
  • transformers

Fingerprint

Dive into the research topics of 'A Hybrid Convolutional and Transformer Network for Salient Object Detection'. Together they form a unique fingerprint.

Cite this