Gumbel MuZero for the Game of 2048

Chih Yu Kao, Hung Guei, Ti Rong Wu, I. Chen Wu*

*此作品的通信作者

研究成果: Conference contribution同行評審

2 引文 斯高帕斯(Scopus)

摘要

In recent years, AlphaZero and MuZero have achieved remarkable success in a broad range of applications. AlphaZero masters playing without human knowledge, while MuZero also learns the game rules and environment's dynamics without the access to a simulator during planning, which makes it applicable to complex environments. Both algorithms adopt Monte Carlo tree search (MCTS) during self-play, usually using hundreds of simulations for one move. For stochasticity, Stochastic MuZero was proposed to learn a stochastic model and uses the learned model to perform the tree search. Recently, Gumbel MuZero was proposed to ensure the policy improvement and can thus learn reliably with a small number of simulations. However, Gumbel MuZero used a deterministic model as in MuZero, limiting its performance in stochastic environments. In this paper, we propose to combine Gumbel MuZero and Stochastic MuZero, the first attempt to apply Gumbel MuZero to a stochastic environment. Our experiment on the stochastic puzzle game 2048 demonstrates that the combined algorithm can perform well and achieve an average score of 394,645 with only 3 simulations during training, greatly reducing the computational resource needed for training.

原文English
主出版物標題Proceedings - 2022 International Conference on Technologies and Applications of Artificial Intelligence, TAAI 2022
發行者Institute of Electrical and Electronics Engineers Inc.
頁面42-47
頁數6
ISBN(電子)9798350399509
DOIs
出版狀態Published - 2022
事件27th International Conference on Technologies and Applications of Artificial Intelligence, TAAI 2022 - Tainan, 台灣
持續時間: 1 12月 20223 12月 2022

出版系列

名字Proceedings - 2022 International Conference on Technologies and Applications of Artificial Intelligence, TAAI 2022

Conference

Conference27th International Conference on Technologies and Applications of Artificial Intelligence, TAAI 2022
國家/地區台灣
城市Tainan
期間1/12/223/12/22

指紋

深入研究「Gumbel MuZero for the Game of 2048」主題。共同形成了獨特的指紋。

引用此