Multi-stage temporal difference learning for 2048

I-Chen Wu*, Kun Hao Yeh, Chao Chin Liang, Chia Chuan Chang, Han Chiang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

8 Scopus citations


Recently, Szubert and Jaskowski successfully used TD learning together with n-tuple networks for playing the game 2048. In this paper, we first improve their result by modifying the n-tuple networks. However, we observe a phenomenon that the programs based on TD learning still hardly reach large tiles, such as 32768-tiles (the tiles with value 32768). In this paper, we propose a new learning method, named multi-stage TD learning, to effectively improve the performance, especially for maximum scores and the reaching ratio of 32768-tiles. After incorporating shallow expectimax search, our 2048 program can reach 32768-tiles with probability 10.9%, and obtain the maximum score 605752 and the averaged score 328946. To the best of our knowledge, our program outperforms all the known 2048 programs up to date, except for the program developed by the programmers, nicknamed nneonneo and xificurk, which heavily relies on deep search heuristics tuned manually. The program can reach 32768-tiles with probability 32%, but ours runs about 100 times faster. Also interestingly, our new learning method can be easily applied to other 2048-like games, such as Threes. Our program for Threes outperforms all the known Threes programs up to date.


  • 2048
  • Expectimax
  • Stochastic puzzle game
  • Temporal difference learning
  • Threes!


Dive into the research topics of 'Multi-stage temporal difference learning for 2048'. Together they form a unique fingerprint.

Cite this