diff --git a/document.tex b/document.tex index 8b6a1e9..171700a 100644 --- a/document.tex +++ b/document.tex @@ -48,7 +48,7 @@ Intelligent Processing, Nanjing University of Posts and Telecommunications, chenxg@njupt.edu.cn).}% \IEEEcompsocitemizethanks{ \IEEEcompsocthanksitem W. Wang is - with College of Electronic Engineering, National University of Defense Technology, + with the College of Electronic Engineering, National University of Defense Technology, P.R., China \protect (e-mail: wangwenhao11@nudt.edu.cn). \emph{(Corresponding author: W. Wang.)}% @@ -67,17 +67,30 @@ wangwenhao11@nudt.edu.cn). \maketitle \begin{abstract} -In reinforcement learning of 2048 game, -we are intrigued by the absence of successful cases -involving explicit exploration, e.g., $\epsilon-$greedy, -softmax. -Through experiments comparing the 2048 game and maze, -we argue that explicit exploration strategies - cannot be effectively combined to learn in the 2048 game, - and demonstrate the acyclic nature of the 2048 game. - The successful experiences in the 2048 game AI - will contribute to solving acyclic MDPs and - MDPs with acyclic structures. +In the reinforcement learning of the 2048 game, +we observed that existing successful cases +do not explicitly utilize exploration strategies + such as $\epsilon-$greedy and softmax. + Szubert and Ja{\'s}kowski argued that + the intrinsic randomness of the 2048 game does + not necessitate the use of exploration strategies. + However, through experiments, + we found that incorporating the $\epsilon-$greedy + exploration strategy into the 2048 game + leads to very poor learning outcomes. + This suggests that it's not that exploration + strategies are unnecessary, but rather that + they cannot be used effectively. + By combining near-optimal policies with an + $\epsilon-$greedy exploration strategy + and comparing the 2048 game with a maze game, + we discovered that in the maze game, the $\epsilon-$greedy + exploration led to an $\epsilon-$greedy policy, + whereas this was not the case for the 2048 game. + This led us to uncover a crucial property of 2048: its acyclic nature. + We proved that the 2048 game is acyclic between non-absorbing states. + This is the fundamental reason why explicit exploration cannot be + employed in the 2048 game. \end{abstract} \begin{IEEEkeywords}