Stochastic Curiosity Maximizing Exploration

Jen-Tzung Chien, Po Chien Hsu

研究成果: Conference contribution同行評審

6 引文 斯高帕斯(Scopus)


Deep reinforcement learning (RL) is known as an emerging research trend in machine learning for autonomous systems. In real-world scenarios, the extrinsic rewards, acquired from the environment for learning an agent, are usually missing or extremely sparse. Such an issue of sparse reward constrains the learning capability of agent because the agent only updates the policy when the goal state is successfully attained. It is always challenging to implement an efficient exploration in RL algorithms. To tackle the sparse reward and inefficient exploration, the agent needs other helpful information to update its policy even when there is no interaction with the environment. This paper proposes the stochastic curiosity maximizing exploration (SCME), a learning strategy explored to allow the agent to act as human. We cope with the sparse reward problem by encouraging the agent to explore future diversity. To do so, a latent dynamic system is developed to acquire the latent states and latent actions to predict the variations in future conditions. The mutual information and the prediction error in the predicted states and actions are calculated as the intrinsic rewards. The agent based on SCME is therefore learned by maximizing these rewards to improve sample efficiency for exploration. The experiments on PyDial and Super Mario Bros show the benefits of the proposed SCME in dialogue system and computer game, respectively.

主出版物標題2020 International Joint Conference on Neural Networks, IJCNN 2020 - Proceedings
發行者Institute of Electrical and Electronics Engineers Inc.
出版狀態Published - 7月 2020
事件2020 International Joint Conference on Neural Networks, IJCNN 2020 - Virtual, Glasgow, United Kingdom
持續時間: 19 7月 202024 7月 2020


名字Proceedings of the International Joint Conference on Neural Networks


Conference2020 International Joint Conference on Neural Networks, IJCNN 2020
國家/地區United Kingdom
城市Virtual, Glasgow


深入研究「Stochastic Curiosity Maximizing Exploration」主題。共同形成了獨特的指紋。