Object co-segmentation aims to segment the common objects in images. This paper presents a CNN-based method that is unsupervised and end-to-end trainable to better solve this task. Our method is unsupervised in the sense that it does not require any training data in the form of object masks but merely a set of images jointly covering objects of a specific class. Our method comprises two collaborative CNN modules, a feature extractor and a co-attention map generator. The former module extracts the features of the estimated objects and backgrounds, and is derived based on the proposed co-attention loss which minimizes inter-image object discrepancy while maximizing intra-image figure-ground separation. The latter module is learned to generated co-attention maps by which the estimated figure-ground segmentation can better fit the former module. Besides, the co-attention loss, the mask loss is developed to retain the whole objects and remove noises. Experiments show that our method achieves superior results, even outperforming the state-of-the-art, supervised methods.
|Original language||American English|
|Title of host publication||IJCAI'18: Proceedings of the 27th International Joint Conference on Artificial Intelligence|
|Publisher||International Joint Conferences on Artificial Intelligence|
|Number of pages||9|
|State||Published - Jul 2018|
|Name||IJCAI International Joint Conference on Artificial Intelligence|