Semantic Cross Attention for Few-shot Learning

Bin Xiao, Chien Liang Liu*, Wen Hoar Hsaio

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

2 Scopus citations

Abstract

Few-shot learning (FSL) has attracted considerable attention recently. Among existing approaches, the metric-based method aims to train an embedding network that can make similar samples close while dissimilar samples as far as possible and achieves promising results. FSL is characterized by using only a few images to train a model that can generalize to novel classes in image classification problems, but this setting makes it difficult to learn the visual features that can identify the images’ appearance variations. The model training is likely to move in the wrong direction, as the images in an identical semantic class may have dissimilar appearances, whereas the images in different semantic classes may share a similar appearance. We argue that FSL can benefit from additional semantic features to learn discriminative feature representations. Thus, this study proposes a multi-task learning approach to view semantic features of label text as an auxiliary task to help boost the performance of the FSL task. Our proposed model uses word-embedding representations as semantic features to help train the embedding network and a semantic cross-attention module to bridge the semantic features into the typical visual modal. The proposed approach is simple, but produces excellent results. We apply our proposed approach to two previous metric-based FSL methods, all of which can substantially improve performance.

Original languageEnglish
Pages (from-to)1165-1180
Number of pages16
JournalProceedings of Machine Learning Research
Volume189
StatePublished - 2022
Event14th Asian Conference on Machine Learning, ACML 2022 - Hyderabad, India
Duration: 12 Dec 202214 Dec 2022

Keywords

  • Cross attention
  • Few-shot learning
  • Metric-based method
  • Multi-task learning

Fingerprint

Dive into the research topics of 'Semantic Cross Attention for Few-shot Learning'. Together they form a unique fingerprint.

Cite this