Learning to Stop: Dynamic Simulation Monte Carlo Tree Search

Li Cheng Lan, Ti Rong Wu, I. Chen Wu, Cho Jui Hsieh

研究成果: Conference contribution同行評審

摘要

Monte Carlo tree search (MCTS) has achieved state-of-the-art results in many domains such as Go and Atari games when combining with deep neural networks (DNNs). When more simulations are executed, MCTS can achieve higher performance but also requires enormous amounts of CPU and GPU resources. However, not all states require a long searching time to identify the best action that the agent can find. For example, in 19x19 Go and NoGo, we found that for more than half of the states, the best action predicted by DNN remains unchanged even after searching 2 minutes. This implies that a significant amount of resources can be saved if we are able to stop the searching earlier when we are confident with the current searching result. In this paper, we propose to achieve this goal by predicting the uncertainty of the current searching status and use the result to decide whether we should stop searching. With our algorithm, called Dynamic Simulation MCTS (DS-MCTS), we can speed up a NoGo agent trained by AlphaZero 2.5 times faster while maintaining a similar winning rate, which is critical for training and conducting experiments. Also, under the same average simulation count, our method can achieve a 61% winning rate against the original program.

原文English
主出版物標題35th AAAI Conference on Artificial Intelligence, AAAI 2021
發行者Association for the Advancement of Artificial Intelligence
頁面259-267
頁數9
ISBN(電子)9781713835974
出版狀態Published - 2021
事件35th AAAI Conference on Artificial Intelligence, AAAI 2021 - Virtual, Online
持續時間: 2 2月 20219 2月 2021

出版系列

名字35th AAAI Conference on Artificial Intelligence, AAAI 2021
1

Conference

Conference35th AAAI Conference on Artificial Intelligence, AAAI 2021
城市Virtual, Online
期間2/02/219/02/21

指紋

深入研究「Learning to Stop: Dynamic Simulation Monte Carlo Tree Search」主題。共同形成了獨特的指紋。

引用此