ALPHAZERO-BASED PROOF COST NETWORK TO AID GAME SOLVING

Ti Rong Wu, Chung Chin Shih, Ting Han Wei, Meng Yu Tsai, Wei Yuan Hsu, I. Chen Wu

研究成果: Paper同行評審

4 引文 斯高帕斯(Scopus)

摘要

The AlphaZero algorithm learns and plays games without hand-crafted expert knowledge. However, since its objective is to play well, we hypothesize that a better objective can be defined for the related but separate task of solving games. This paper proposes a novel approach to solving problems by modifying the training target of the AlphaZero algorithm, such that it prioritizes solving the game quickly, rather than winning. We train a Proof Cost Network (PCN), where proof cost is a heuristic that estimates the amount of work required to solve problems. This matches the general concept of the so-called proof number from proof number search, which has been shown to be well-suited for game solving. We propose two specific training targets. The first finds the shortest path to a solution, while the second estimates the proof cost. We conduct experiments on solving 15x15 Gomoku and 9x9 Killall-Go problems with both MCTS-based and focused depth-first proof number search solvers. Comparisons between using AlphaZero networks and PCN as heuristics show that PCN can solve more problems.

原文English
出版狀態Published - 2022
事件10th International Conference on Learning Representations, ICLR 2022 - Virtual, Online
持續時間: 25 4月 202229 4月 2022

Conference

Conference10th International Conference on Learning Representations, ICLR 2022
城市Virtual, Online
期間25/04/2229/04/22

指紋

深入研究「ALPHAZERO-BASED PROOF COST NETWORK TO AID GAME SOLVING」主題。共同形成了獨特的指紋。

引用此