Recently, Szubert and Jaskowski successfully used TD learning together with n-tuple networks for playing the game 2048. In this paper, we first improve their result by modifying the n-tuple networks. However, we observe a phenomenon that the programs based on TD learning still hardly reach large tiles, such as 32768-tiles (the tiles with value 32768). In this paper, we propose a new learning method, named multi-stage TD learning, to effectively improve the performance, especially for maximum scores and the reaching ratio of 32768-tiles. After incorporating shallow expectimax search, our 2048 program can reach 32768-tiles with probability 10.9%, and obtain the maximum score 605752 and the averaged score 328946. To the best of our knowledge, our program outperforms all the known 2048 programs up to date, except for the program developed by the programmers, nicknamed nneonneo and xificurk, which heavily relies on deep search heuristics tuned manually. The program can reach 32768-tiles with probability 32%, but ours runs about 100 times faster. Also interestingly, our new learning method can be easily applied to other 2048-like games, such as Threes. Our program for Threes outperforms all the known Threes programs up to date.
|Number of pages||13|
|Journal||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|State||Published - 1 Jan 2014|
- Stochastic puzzle game
- Temporal difference learning