TY - GEN
T1 - Model-Based Soft Actor-Critic
AU - Chien, Jen Tzung
AU - Yang, Shu Hsiang
N1 - Publisher Copyright:
© 2021 APSIPA.
PY - 2021
Y1 - 2021
N2 - Deep reinforcement learning has been successfully developed for many challenging applications. However, collecting new data in actual environment requires a lot of costs which make the agent to learn slowly for high-dimensional states and actions. It is crucial to enhance the sample efficiency and learn with long-term planning. To tackle these issues, this study presents a stochastic agent driven by a new model-based soft actor-critic (MSAC). The dynamics of the environment as well as the reward function are represented by a learnable world model which allows the agent to explore latent representation of environment which conducts stochastic prediction and foresight planning. An off-policy method is proposed by combining with an online learning for world model. The actor, critic and world model are jointly trained to fulfill multi-step foresight imagination. To further enhance the performance, an overshooting scheme is incorporated for long-term planning, and the multi-step rollout is applied for stochastic prediction. The experiments on various tasks with continuous actions show the merit of the proposed MSAC for data efficiency in reinforcement learning.
AB - Deep reinforcement learning has been successfully developed for many challenging applications. However, collecting new data in actual environment requires a lot of costs which make the agent to learn slowly for high-dimensional states and actions. It is crucial to enhance the sample efficiency and learn with long-term planning. To tackle these issues, this study presents a stochastic agent driven by a new model-based soft actor-critic (MSAC). The dynamics of the environment as well as the reward function are represented by a learnable world model which allows the agent to explore latent representation of environment which conducts stochastic prediction and foresight planning. An off-policy method is proposed by combining with an online learning for world model. The actor, critic and world model are jointly trained to fulfill multi-step foresight imagination. To further enhance the performance, an overshooting scheme is incorporated for long-term planning, and the multi-step rollout is applied for stochastic prediction. The experiments on various tasks with continuous actions show the merit of the proposed MSAC for data efficiency in reinforcement learning.
UR - http://www.scopus.com/inward/record.url?scp=85126648661&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85126648661
T3 - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
SP - 2028
EP - 2035
BT - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021
Y2 - 14 December 2021 through 17 December 2021
ER -