Latent Semantic and Disentangled Attention

Jen Tzung Chien*, Yu Han Huang

*此作品的通信作者

研究成果: Article同行評審

3 引文 斯高帕斯(Scopus)

摘要

Sequential learning using transformer has achieved state-of-the-art performance in natural language tasks and many others. The key to this success is the multi-head self attention which encodes and gathers the features from individual tokens of an input sequence. The mapping or decoding is performed to produce an output sequence via cross attention. There are threefold weaknesses by using such an attention framework. First, since the attention would mix up the features of different tokens in input and output sequences, it is likely that redundant information exists in sequence data representation. Second, the patterns of attention weights among different heads tend to be similar. The model capacity is bounded. Third, the robustness in an encoder-decoder network against the model uncertainty is disregarded. To handle these weaknesses, this paper presents a Bayesian semantic and disentangled mask attention to learn latent disentanglement in multi-head attention where the redundant features in transformer are compensated with the latent topic information. The attention weights are filtered by a mask which is optimized through semantic clustering. This attention mechanism is implemented according to Bayesian learning for clustered disentanglement. The experiments on machine translation and speech recognition show the merit of Bayesian clustered disentanglement for mask attention.

原文English
頁(從 - 到)10047-10059
頁數13
期刊IEEE Transactions on Pattern Analysis and Machine Intelligence
46
發行號12
DOIs
出版狀態Published - 2024

指紋

深入研究「Latent Semantic and Disentangled Attention」主題。共同形成了獨特的指紋。

引用此