修改了摘要

0a7f9c76 · Lenovo · 00d7d8d4 · 0a7f9c76
Commit 0a7f9c76 authored Jun 03, 2024 by Lenovo
Hide whitespace changes
Inline Side-by-side

Showing with 25 additions and 12 deletions

document.tex
+25 -12

No files found.
--- a/document.tex
+++ b/document.tex
@@ -48,7 +48,7 @@ Intelligent Processing, Nanjing University of Posts and Telecommunications,
 chenxg@njupt.edu.cn).}%
 \IEEEcompsocitemizethanks{
 \IEEEcompsocthanksitem W. Wang is 
- with College of Electronic Engineering, National University of Defense Technology, 
+ with the College of Electronic Engineering, National University of Defense Technology, 
 P.R., China \protect (e-mail:
 wangwenhao11@nudt.edu.cn).
 \emph{(Corresponding author: W. Wang.)}%
@@ -67,17 +67,30 @@ wangwenhao11@nudt.edu.cn).
 \maketitle

 \begin{abstract}
-In  reinforcement learning  of 2048 game, 
-we are intrigued by the absence of successful cases 
-involving explicit exploration, e.g., $\epsilon-$greedy, 
-softmax. 
-Through experiments comparing the 2048 game and maze, 
-we argue that explicit exploration strategies
- cannot be effectively combined to learn in the 2048 game,
-  and demonstrate the acyclic nature of the 2048 game. 
-  The successful experiences in the 2048 game AI 
-  will contribute to solving acyclic MDPs and
-  MDPs with acyclic structures.
+In the reinforcement learning of the 2048 game, 
+we observed that existing successful cases 
+do not explicitly utilize exploration strategies
+ such as $\epsilon-$greedy and softmax. 
+ Szubert and Ja{\'s}kowski argued that
+  the intrinsic randomness of the 2048 game does 
+  not necessitate the use of exploration strategies.
+   However, through experiments, 
+   we found that incorporating the $\epsilon-$greedy
+    exploration strategy into the 2048 game 
+    leads to very poor learning outcomes. 
+    This suggests that it's not that exploration 
+    strategies are unnecessary, but rather that 
+    they cannot be used effectively. 
+    By combining near-optimal policies with an
+     $\epsilon-$greedy exploration strategy
+      and comparing the 2048 game with a maze game, 
+      we discovered that in the maze game, the $\epsilon-$greedy
+       exploration led to an $\epsilon-$greedy policy, 
+       whereas this was not the case for the 2048 game. 
+    This led us to uncover a crucial property of 2048: its acyclic nature.
+    We proved that the 2048 game is acyclic between non-absorbing states.
+     This is the fundamental reason why explicit exploration cannot be 
+     employed in the 2048 game.
 \end{abstract}

 \begin{IEEEkeywords}