Top-down saliency detection aims to highlight the regions of a specific object cate-gory, and typically relies on pixel-wise annotated training data. In this paper, we address the high cost of collecting such training data by presenting a weakly supervised approach to object saliency detection, where only image-level labels, indicating the presence or ab-sence of a target object in an image, are available. The proposed framework is composed of two deep modules, an image-level classifier and a pixel-level map generator. While the former distinguishes images with objects of interest from the rest, the latter is learned to generate saliency maps so that the training images masked by the maps can be better predicted by the former. In addition to the top-down guidance from class labels, the map generator is derived by also referring to other image information, including the back-ground prior, area balance and spatial consensus. This information greatly regularizes the training process and reduces the risk of overfitting, especially when learning deep models with few training data. In the experiments, we show that our method gets supe-rior results, and even outperforms many strongly supervised methods.
|Original language||American English|
|Number of pages||13|
|State||Published - 1 Sep 2017|