Supportive and Self Attentions for Image Caption

Jen Tzung Chien, Ting An Lin

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

Attention over an observed image or natural sentence is run by spotting or locating the region or position of interest for pattern classification. The attention parameter is seen as a latent variable, which was indirectly calculated by minimizing the classification loss. Using such an attention mechanism, the target information may not be correctly identified. Therefore, in addition to minimizing the classification error, we can directly attend the region of interest by minimizing the reconstruction error due to supporting data. Our idea is to learn how to attend through the so-called supportive attention when the supporting information is available. A new attention mechanism is developed to conduct the attentive learning for translation invariance which is applied for image caption. The derived information is helpful for generating caption from input image. Moreover, this paper presents an association network which does not only implement the word-to-image attention, but also carry out the image-to-image attention via self attention. The relations between image and text are sufficiently represented. Experiments on MS-COCO task show the benefit of the proposed supportive and self attentions for image caption with the keyvalue memory network.

Original languageEnglish
Title of host publication2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1713-1718
Number of pages6
ISBN (Electronic)9789881476883
StatePublished - 7 Dec 2020
Event2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 - Virtual, Auckland, New Zealand
Duration: 7 Dec 202010 Dec 2020

Publication series

Name2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 - Proceedings

Conference

Conference2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020
Country/TerritoryNew Zealand
CityVirtual, Auckland
Period7/12/2010/12/20

Keywords

  • association network
  • attention mechanism
  • caption
  • encoder-decoder network

Fingerprint

Dive into the research topics of 'Supportive and Self Attentions for Image Caption'. Together they form a unique fingerprint.

Cite this