Traditionally, task-oriented dialogue system is built by an autonomous agent which can be trained by reinforcement learning where the reward from environment is maximized. The agent is learned by updating the policy when the goal state is observed. However, in real world, the extrinsic reward is usually sparse or missing. The training efficiency is bounded. The system performance is degraded. It is challenging to tackle the issue of sample efficiency in sparse reward scenario for spoken dialogues. Accordingly, a dialogue agent needs additional information to update its policy even in the period when reward is absent in the environment. This paper presents a new dialogue agent which is learned by incorporating the intrinsic reward based on the information-theoretic approach via stochastic curiosity exploration. This agent encourages the exploration for future diversity based on a latent dynamic architecture which consists of encoder network, curiosity network, information network and policy network. The latent states and actions are drawn to predict stochastic transition for future. The curiosity learning are implemented with intrinsic reward in a metric of mutual information and prediction error in the predicted states and actions. Experiments on dialogue management using PyDial demonstrate the benefit by using the stochastic curiosity exploration.
|頁（從 - 到）||3885-3889|
|期刊||Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH|
|出版狀態||Published - 2020|
|事件||21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China|
持續時間: 25 十月 2020 → 29 十月 2020