Multiple policy value Monte Carlo tree search

Li Cheng Lan, Wei Li, Ting Han Wei, I. Chen Wu

研究成果: Conference contribution同行評審

4 引文 斯高帕斯(Scopus)

摘要

Many of the strongest game playing programs use a combination of Monte Carlo tree search (MCTS) and deep neural networks (DNN), where the DNNs are used as policy or value evaluators. Given a limited budget, such as online playing or during the self-play phase of AlphaZero (AZ) training, a balance needs to be reached between accurate state estimation and more MCTS simulations, both of which are critical for a strong game playing agent. Typically, larger DNNs are better at generalization and accurate evaluation, while smaller DNNs are less costly, and therefore can lead to more MCTS simulations and bigger search trees with the same budget. This paper introduces a new method called the multiple policy value MCTS (MPV-MCTS), which combines multiple policy value neural networks (PV-NNs) of various sizes to retain advantages of each network, where two PV-NNs fS and fL are used in this paper. We show through experiments on the game NoGo that a combined fS and fL MPV-MCTS outperforms single PV-NN with policy value MCTS, called PV-MCTS. Additionally, MPV-MCTS also outperforms PV-MCTS for AZ training.

原文English
主出版物標題Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019
編輯Sarit Kraus
發行者International Joint Conferences on Artificial Intelligence
頁面4704-4710
頁數7
ISBN(電子)9780999241141
DOIs
出版狀態Published - 2019
事件28th International Joint Conference on Artificial Intelligence, IJCAI 2019 - Macao, 中國
持續時間: 10 8月 201916 8月 2019

出版系列

名字IJCAI International Joint Conference on Artificial Intelligence
2019-August
ISSN(列印)1045-0823

Conference

Conference28th International Joint Conference on Artificial Intelligence, IJCAI 2019
國家/地區中國
城市Macao
期間10/08/1916/08/19

指紋

深入研究「Multiple policy value Monte Carlo tree search」主題。共同形成了獨特的指紋。

引用此