TY - JOUR
T1 - Solving hard-exploration problems with counting and replay approach
AU - Huang, Bo Ying
AU - Tsai, Shi Chun
N1 - Publisher Copyright:
© 2022 Elsevier Ltd
PY - 2022/4
Y1 - 2022/4
N2 - The reinforcement learning agent has been very successful in many Atari 2600 games. However, while applied to a more complex and challenging environment, it is crucial to avoid falling into the local optimum, especially when the games contain many traps, ample action space, challenging scenarios, and sporadic successful episodes. In this case, using the intrinsic motivation method can easily fall into the local optimum. If the domain knowledge is excessively used, it is not applicable when encountering different game designs. Therefore, to enhance the agent's ability to explore and avoid catastrophic forgetting due to the fades of intrinsic motivation, a Trajectory Evaluation Module is developed and integrated with ideas from the Count-Based Exploration and Trajectory Replay method. Moreover, our approach is integrated very well with the Self Imitation Learning method and works effectively for hard-exploration video games. Our policy is also evaluated with two video games: Super Mario Bros and Sonic the Hedgehog. The experiment results show that our Trajectory Evaluation Module can help the agent pass through various obstacles and scenarios, and successfully break through all levels of Super Mario Bros.
AB - The reinforcement learning agent has been very successful in many Atari 2600 games. However, while applied to a more complex and challenging environment, it is crucial to avoid falling into the local optimum, especially when the games contain many traps, ample action space, challenging scenarios, and sporadic successful episodes. In this case, using the intrinsic motivation method can easily fall into the local optimum. If the domain knowledge is excessively used, it is not applicable when encountering different game designs. Therefore, to enhance the agent's ability to explore and avoid catastrophic forgetting due to the fades of intrinsic motivation, a Trajectory Evaluation Module is developed and integrated with ideas from the Count-Based Exploration and Trajectory Replay method. Moreover, our approach is integrated very well with the Self Imitation Learning method and works effectively for hard-exploration video games. Our policy is also evaluated with two video games: Super Mario Bros and Sonic the Hedgehog. The experiment results show that our Trajectory Evaluation Module can help the agent pass through various obstacles and scenarios, and successfully break through all levels of Super Mario Bros.
KW - Count-Based Exploration
KW - Hard-exploration video games
KW - Sonic the Hedgehog
KW - Sparse reward
KW - Super Mario Bros
KW - The deep reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85124318292&partnerID=8YFLogxK
U2 - 10.1016/j.engappai.2022.104701
DO - 10.1016/j.engappai.2022.104701
M3 - Article
AN - SCOPUS:85124318292
SN - 0952-1976
VL - 110
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
M1 - 104701
ER -