Latent Semantic and Disentangled Attention

Jen Tzung Chien*, Yu Han Huang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Sequential learning using transformer has achieved state-of-the-art performance in natural language tasks and many others. The key to this success is the multi-head self attention which encodes and gathers the features from individual tokens of an input sequence. The mapping or decoding is performed to produce an output sequence via cross attention. There are threefold weaknesses by using such an attention framework. First, since the attention would mix up the features of different tokens in input and output sequences, it is likely that redundant information exists in sequence data representation. Second, the patterns of attention weights among different heads tend to be similar. The model capacity is bounded. Third, the robustness in an encoder-decoder network against the model uncertainty is disregarded. To handle these weaknesses, this paper presents a Bayesian semantic and disentangled mask attention to learn latent disentanglement in multi-head attention where the redundant features in transformer are compensated with the latent topic information. The attention weights are filtered by a mask which is optimized through semantic clustering. This attention mechanism is implemented according to Bayesian learning for clustered disentanglement. The experiments on machine translation and speech recognition show the merit of Bayesian clustered disentanglement for mask attention.

Original languageEnglish
Pages (from-to)10047-10059
Number of pages13
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume46
Issue number12
DOIs
StatePublished - 2024

Keywords

  • Bayesian learning
  • Sequential learning
  • disentangled representation
  • mask attention
  • transformer

Fingerprint

Dive into the research topics of 'Latent Semantic and Disentangled Attention'. Together they form a unique fingerprint.

Cite this