TY - GEN
T1 - Collaborative Learning of Multiple-Discontinuous-Image Saliency Prediction for Drone Exploration
AU - Chu, Ting Tsan
AU - Chen, Po Heng
AU - Huang, Pin Jie
AU - Chen, Kuan Wen
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - Most of the existing saliency prediction research focuses on either single images or videos (or more precisely multiple images in sequence). However, to apply saliency prediction to drone exploration that has to consider multiple images from different view angles or localizations to determine the direction to explore, saliency prediction over multiple discontinuous images is required. In this paper, we propose a deep relative saliency model (MS-Net) for such an application. MS-Net starts with a single-image saliency feature extraction network for each image separately and then integrate these images by using a GCN-based mechanism called multi-image saliency fusion that learns relative saliency information among all the images. Finally, it predicts the saliency of each image by considering the relative information. Because there are no existing saliency prediction datasets with such multiple discontinuous images, we randomly cropped a large number of sub-images from 360° images of the existing 360° image saliency datasets to build our own dataset for both training and evaluation. Experimental results showed that the proposed MS-Net considerably outperformed both single-image and video saliency prediction methods and could achieve comparative performance to that of 360° image saliency prediction even with only limited field-of-views, i.e., five sub-images, considered.
AB - Most of the existing saliency prediction research focuses on either single images or videos (or more precisely multiple images in sequence). However, to apply saliency prediction to drone exploration that has to consider multiple images from different view angles or localizations to determine the direction to explore, saliency prediction over multiple discontinuous images is required. In this paper, we propose a deep relative saliency model (MS-Net) for such an application. MS-Net starts with a single-image saliency feature extraction network for each image separately and then integrate these images by using a GCN-based mechanism called multi-image saliency fusion that learns relative saliency information among all the images. Finally, it predicts the saliency of each image by considering the relative information. Because there are no existing saliency prediction datasets with such multiple discontinuous images, we randomly cropped a large number of sub-images from 360° images of the existing 360° image saliency datasets to build our own dataset for both training and evaluation. Experimental results showed that the proposed MS-Net considerably outperformed both single-image and video saliency prediction methods and could achieve comparative performance to that of 360° image saliency prediction even with only limited field-of-views, i.e., five sub-images, considered.
UR - http://www.scopus.com/inward/record.url?scp=85125505651&partnerID=8YFLogxK
U2 - 10.1109/ICRA48506.2021.9561681
DO - 10.1109/ICRA48506.2021.9561681
M3 - Conference contribution
AN - SCOPUS:85125505651
T3 - Proceedings - IEEE International Conference on Robotics and Automation
SP - 11343
EP - 11349
BT - 2021 IEEE International Conference on Robotics and Automation, ICRA 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE International Conference on Robotics and Automation, ICRA 2021
Y2 - 30 May 2021 through 5 June 2021
ER -