TY - JOUR
T1 - Deep Co-Saliency Detection via Stacked Autoencoder-Enabled Fusion and Self-Trained CNNs
AU - Tsai, Chung Chi
AU - Hsu, Kuang Jui
AU - Lin, Yen-Yu
AU - Qian, Xiaoning
AU - Chuang, Yung Yu
PY - 2020/4
Y1 - 2020/4
N2 - Image co-saliency detection via fusion-based or learning-based methods faces cross-cutting issues. Fusion-based methods often combine saliency proposals using a majority voting rule. Their performance hence highly depends on the quality and coherence of individual proposals. Learning-based methods typically require ground-truth annotations for training, which are not available for co-saliency detection. In this work, we present a two-stage approach to address these issues jointly. At the first stage, an unsupervised deep learning model with stacked autoencoder (SAE) is proposed to evaluate the quality of saliency proposals. It employs latent representations for image foregrounds, and auto-encodes foreground consistency and foreground-background distinctiveness in a discriminative way. The resultant model, SAE-enabled fusion (SAEF), can combine multiple saliency proposals to yield a more reliable saliency map. At the second stage, motivated by the fact that fusion often leads to over-smoothed saliency maps, we develop self-trained convolutional neural networks (STCNN) to alleviate this negative effect. STCNN takes the saliency maps produced by SAEF as inputs. It propagates information from regions of high confidence to those of low confidence. During propagation, feature representations are distilled, resulting in sharper and better co-saliency maps. Our approach is comprehensively evaluated on three benchmarks, including MSRC, iCoseg, and Cosal2015, and performs favorably against the state-of-the-arts. In addition, we demonstrate that our method can be applied to object co-segmentation and object co-localization, achieving the state-of-the-art performance in both applications.
AB - Image co-saliency detection via fusion-based or learning-based methods faces cross-cutting issues. Fusion-based methods often combine saliency proposals using a majority voting rule. Their performance hence highly depends on the quality and coherence of individual proposals. Learning-based methods typically require ground-truth annotations for training, which are not available for co-saliency detection. In this work, we present a two-stage approach to address these issues jointly. At the first stage, an unsupervised deep learning model with stacked autoencoder (SAE) is proposed to evaluate the quality of saliency proposals. It employs latent representations for image foregrounds, and auto-encodes foreground consistency and foreground-background distinctiveness in a discriminative way. The resultant model, SAE-enabled fusion (SAEF), can combine multiple saliency proposals to yield a more reliable saliency map. At the second stage, motivated by the fact that fusion often leads to over-smoothed saliency maps, we develop self-trained convolutional neural networks (STCNN) to alleviate this negative effect. STCNN takes the saliency maps produced by SAEF as inputs. It propagates information from regions of high confidence to those of low confidence. During propagation, feature representations are distilled, resulting in sharper and better co-saliency maps. Our approach is comprehensively evaluated on three benchmarks, including MSRC, iCoseg, and Cosal2015, and performs favorably against the state-of-the-arts. In addition, we demonstrate that our method can be applied to object co-segmentation and object co-localization, achieving the state-of-the-art performance in both applications.
KW - adaptive fusion
KW - CNNs
KW - Co-saliency detection
KW - optimization
KW - reconstruction residual
KW - self-paced learning
KW - stacked autoencoder
UR - http://www.scopus.com/inward/record.url?scp=85082883637&partnerID=8YFLogxK
U2 - 10.1109/TMM.2019.2936803
DO - 10.1109/TMM.2019.2936803
M3 - Article
AN - SCOPUS:85082883637
SN - 1520-9210
VL - 22
SP - 1016
EP - 1031
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
IS - 4
M1 - 8809285
ER -