Model-Based Soft Actor-Critic

Jen Tzung Chien, Shu Hsiang Yang

研究成果: Conference contribution同行評審

3 引文 斯高帕斯(Scopus)

摘要

Deep reinforcement learning has been successfully developed for many challenging applications. However, collecting new data in actual environment requires a lot of costs which make the agent to learn slowly for high-dimensional states and actions. It is crucial to enhance the sample efficiency and learn with long-term planning. To tackle these issues, this study presents a stochastic agent driven by a new model-based soft actor-critic (MSAC). The dynamics of the environment as well as the reward function are represented by a learnable world model which allows the agent to explore latent representation of environment which conducts stochastic prediction and foresight planning. An off-policy method is proposed by combining with an online learning for world model. The actor, critic and world model are jointly trained to fulfill multi-step foresight imagination. To further enhance the performance, an overshooting scheme is incorporated for long-term planning, and the multi-step rollout is applied for stochastic prediction. The experiments on various tasks with continuous actions show the merit of the proposed MSAC for data efficiency in reinforcement learning.

原文English
主出版物標題2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
發行者Institute of Electrical and Electronics Engineers Inc.
頁面2028-2035
頁數8
ISBN(電子)9789881476890
出版狀態Published - 2021
事件2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Tokyo, 日本
持續時間: 14 12月 202117 12月 2021

出版系列

名字2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings

Conference

Conference2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021
國家/地區日本
城市Tokyo
期間14/12/2117/12/21

指紋

深入研究「Model-Based Soft Actor-Critic」主題。共同形成了獨特的指紋。

引用此