\documentclass[lettersize,journal]{IEEEtran} \usepackage{amsmath,amsfonts} \usepackage{nicematrix} \usepackage{algorithmic} \usepackage{algorithm} \usepackage{array} \usepackage[caption=false,font=normalsize,labelfont=sf,textfont=sf]{subfig} \usepackage{textcomp} \usepackage{stfloats} \usepackage{url} \usepackage{verbatim} \usepackage{graphicx} %\usepackage{natbib} \newtheorem{theorem}{Theorem} \newtheorem{proposition}[theorem]{Proposition} \newtheorem{lemma}[theorem]{Lemma} \newtheorem{corollary}[theorem]{Corollary} \newtheorem{definition}[theorem]{Definition} \newtheorem{assumption}[theorem]{Assumption} \newtheorem{condition}[theorem]{Condition} \newtheorem{remark}[theorem]{Remark} \usepackage{cite} \usepackage{xeCJK} \usepackage{tikz} \usetikzlibrary{automata, positioning} \usetikzlibrary{positioning} \usetikzlibrary{decorations.markings} \usepackage{multicol} % \usepackage{cuted} % \usepackage{widetext} \hyphenation{op-tical net-works semi-conduc-tor IEEE-Xplore} % updated with editorial comments 8/9/2021 \newcommand{\highlight}[1]{\textcolor{red}{#1}} \begin{document} \title{Acyclicity of Game 2048} \author{Xingguo Chen, Xinwen Li, Shangdong Yang, and Wenhao Wang \thanks{Manuscript received XXXX; revised XXXX; accepted XXXX. Date of publication XXXX; date of current version XXXX. The work was supported by the National Natural Science Foundation, China (Nos. 62276142, 62206133, and 62202240).} \IEEEcompsocitemizethanks{ \IEEEcompsocthanksitem X. Chen, X. Li and S. Yang are with the Jiangsu Key Laboratory of Big Data Security \& Intelligent Processing, Nanjing University of Posts and Telecommunications, P.R., China \protect (e-mail: chenxg@njupt.edu.cn).}% \IEEEcompsocitemizethanks{ \IEEEcompsocthanksitem W. Wang is with the College of Electronic Engineering, National University of Defense Technology, P.R., China \protect (e-mail: wangwenhao11@nudt.edu.cn). \emph{(Corresponding author: W. Wang.)}% } } % The paper headers \markboth{IEEE Transaction on Games,~Vol.~14, No.~8, August~202X}% {Shell \MakeLowercase{\textit{et al.}}: A Sample Article Using IEEEtran.cls for IEEE Journals} %\IEEEpubid{0000--0000/00\$00.00~\copyright~2024 IEEE} % Remember, if you use this you must call \IEEEpubidadjcol in the second % column for its text to clear the IEEEpubid mark. \maketitle \begin{abstract} In the reinforcement learning of the 2048 game, we observed that existing successful cases do not explicitly utilize exploration strategies such as $\epsilon-$greedy and softmax. Szubert and Ja{\'s}kowski argued that the intrinsic randomness of the 2048 game does not necessitate the use of exploration strategies. However, through experiments, we found that incorporating the $\epsilon-$greedy exploration strategy into the 2048 game leads to very poor learning outcomes. This suggests that it's not that exploration strategies are unnecessary, but rather that they cannot be used effectively. By combining near-optimal policies with an $\epsilon-$greedy exploration strategy and comparing the 2048 game with a maze game, we discovered that in the maze game, the $\epsilon-$greedy exploration led to an $\epsilon-$greedy policy, whereas this was not the case for the 2048 game. This led us to uncover a crucial property of 2048: its acyclic nature. We proved that the 2048 game is acyclic between non-absorbing states. This is the fundamental reason why explicit exploration cannot be employed in the 2048 game. Compared to explicit exploration strategies, backward learning, restart, and optimistic initialization are more suitable for acyclic MDPs or MDPs with acyclic structures. \end{abstract} \begin{IEEEkeywords} Acyclicity, 2048 game, ergodicity, exploration, backward learning. \end{IEEEkeywords} \input{main/introduction} \input{main/background} \input{main/acyclic} \input{main/2048isAcyclic} \input{main/discussion} %\input{main/nonergodicity} %\input{main/paradox} %\input{main/theorem} %\input{main/2048prove} \bibliographystyle{IEEEtran} \bibliography{template/IEEEabrv,references} \input{main/biography} \end{document}