TY - GEN
T1 - Causal confusion reduction for robust multi-domain dialogue policy
AU - Rohmatillah, Mahdin
AU - Chien, Jen Tzung
N1 - Publisher Copyright:
© 2021 ISCA
PY - 2021
Y1 - 2021
N2 - In the multi-domain dialogue system, dialog policy plays an important role since it determines the suitable actions based on the user's goals. However, in many recent works, most of the dialogue optimizations, especially that use reinforcement learning (RL) methods, do not perform well. The main problem is that the initial step of optimization that involves the behavior cloning (BC) methods suffer from the causal confusion problem, which means that the agent misidentifies true cause of an expert action in current state. This paper proposes a novel method to improve the performance of BC method in dialogue system. Instead of only predicting correct action given a state from dataset, we introduce the auxiliary tasks to predict both of current belief state and recent user utterance in order to reduce causal confusion of the expert action in the dataset since those features are important in every dialog turn. Experiments on ConvLab-2 shows that, by using this method, all of RL based optimizations are improved. Furthermore, the agent based on the proximal policy optimization shows very significant improvement with the help of the proposed BC agent weights both in policy evaluation as well as in end-to-end system evaluation.
AB - In the multi-domain dialogue system, dialog policy plays an important role since it determines the suitable actions based on the user's goals. However, in many recent works, most of the dialogue optimizations, especially that use reinforcement learning (RL) methods, do not perform well. The main problem is that the initial step of optimization that involves the behavior cloning (BC) methods suffer from the causal confusion problem, which means that the agent misidentifies true cause of an expert action in current state. This paper proposes a novel method to improve the performance of BC method in dialogue system. Instead of only predicting correct action given a state from dataset, we introduce the auxiliary tasks to predict both of current belief state and recent user utterance in order to reduce causal confusion of the expert action in the dataset since those features are important in every dialog turn. Experiments on ConvLab-2 shows that, by using this method, all of RL based optimizations are improved. Furthermore, the agent based on the proximal policy optimization shows very significant improvement with the help of the proposed BC agent weights both in policy evaluation as well as in end-to-end system evaluation.
KW - Behavior cloning
KW - Causal confusion
KW - Multi-domain dialogue system
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85119210469&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2021-534
DO - 10.21437/Interspeech.2021-534
M3 - Conference contribution
AN - SCOPUS:85119210469
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 3761
EP - 3765
BT - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
PB - International Speech Communication Association
T2 - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Y2 - 30 August 2021 through 3 September 2021
ER -