Causal confusion reduction for robust multi-domain dialogue policy

Mahdin Rohmatillah, Jen Tzung Chien

研究成果: Conference contribution同行評審

6 引文 斯高帕斯(Scopus)

摘要

In the multi-domain dialogue system, dialog policy plays an important role since it determines the suitable actions based on the user's goals. However, in many recent works, most of the dialogue optimizations, especially that use reinforcement learning (RL) methods, do not perform well. The main problem is that the initial step of optimization that involves the behavior cloning (BC) methods suffer from the causal confusion problem, which means that the agent misidentifies true cause of an expert action in current state. This paper proposes a novel method to improve the performance of BC method in dialogue system. Instead of only predicting correct action given a state from dataset, we introduce the auxiliary tasks to predict both of current belief state and recent user utterance in order to reduce causal confusion of the expert action in the dataset since those features are important in every dialog turn. Experiments on ConvLab-2 shows that, by using this method, all of RL based optimizations are improved. Furthermore, the agent based on the proximal policy optimization shows very significant improvement with the help of the proposed BC agent weights both in policy evaluation as well as in end-to-end system evaluation.

原文English
主出版物標題22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
發行者International Speech Communication Association
頁面3761-3765
頁數5
ISBN(電子)9781713836902
DOIs
出版狀態Published - 2021
事件22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 - Brno, 捷克共和國
持續時間: 30 8月 20213 9月 2021

出版系列

名字Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
5
ISSN(列印)2308-457X
ISSN(電子)1990-9772

Conference

Conference22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
國家/地區捷克共和國
城市Brno
期間30/08/213/09/21

指紋

深入研究「Causal confusion reduction for robust multi-domain dialogue policy」主題。共同形成了獨特的指紋。

引用此