\section{Discussions} The maze game is cyclic between non-absorbing states. When an agent deviates from the optimal path during exploration, it can quickly adjust back. Therefore, in a maze, a policy combined with an $\epsilon$-greedy exploration remains an $\epsilon$-greedy policy. However, the 2048 game is acyclic between non-absorbing states. Any choice error caused by exploration will persist until the end of the game. Therefore, in the 2048 game, a policy combined with an $\epsilon$-greedy exploration is no more an $\epsilon$-greedy policy. This is why in AI training for the 2048 game, explicit exploration strategies such as $\epsilon$-greedy and soft-max do not work; exploration can only be encouraged through optimistic initialization. Early in 1996, for large acyclic domains, Boyan and Moore proposed a backward algorithm ROUT with function approximations to improve learning \cite{boyan1996learning}. In 2017, Matsuzaki pointed out that The 2048 game has two important unique characteristics compared with conventional board games: (1) ``It has a long sequence of moves''; (2) ``The difficulty increases toward the end of the game'' \cite{matsuzaki2017developing}. Then, he applied backward learning and restart to improve learning. We further declare that the acyclic nature of game 2048 leads to the efficient performance of backward learning. Finally, acyclic MDPs (e.g. games like Connect-Four) and MDPs with acyclic structures can benefit from the algorithmic insights that have led to the success of the 2048 AI.