Commit 0a7f9c76 by Lenovo

修改了摘要

parent 00d7d8d4
......@@ -48,7 +48,7 @@ Intelligent Processing, Nanjing University of Posts and Telecommunications,
chenxg@njupt.edu.cn).}%
\IEEEcompsocitemizethanks{
\IEEEcompsocthanksitem W. Wang is
with College of Electronic Engineering, National University of Defense Technology,
with the College of Electronic Engineering, National University of Defense Technology,
P.R., China \protect (e-mail:
wangwenhao11@nudt.edu.cn).
\emph{(Corresponding author: W. Wang.)}%
......@@ -67,17 +67,30 @@ wangwenhao11@nudt.edu.cn).
\maketitle
\begin{abstract}
In reinforcement learning of 2048 game,
we are intrigued by the absence of successful cases
involving explicit exploration, e.g., $\epsilon-$greedy,
softmax.
Through experiments comparing the 2048 game and maze,
we argue that explicit exploration strategies
cannot be effectively combined to learn in the 2048 game,
and demonstrate the acyclic nature of the 2048 game.
The successful experiences in the 2048 game AI
will contribute to solving acyclic MDPs and
MDPs with acyclic structures.
In the reinforcement learning of the 2048 game,
we observed that existing successful cases
do not explicitly utilize exploration strategies
such as $\epsilon-$greedy and softmax.
Szubert and Ja{\'s}kowski argued that
the intrinsic randomness of the 2048 game does
not necessitate the use of exploration strategies.
However, through experiments,
we found that incorporating the $\epsilon-$greedy
exploration strategy into the 2048 game
leads to very poor learning outcomes.
This suggests that it's not that exploration
strategies are unnecessary, but rather that
they cannot be used effectively.
By combining near-optimal policies with an
$\epsilon-$greedy exploration strategy
and comparing the 2048 game with a maze game,
we discovered that in the maze game, the $\epsilon-$greedy
exploration led to an $\epsilon-$greedy policy,
whereas this was not the case for the 2048 game.
This led us to uncover a crucial property of 2048: its acyclic nature.
We proved that the 2048 game is acyclic between non-absorbing states.
This is the fundamental reason why explicit exploration cannot be
employed in the 2048 game.
\end{abstract}
\begin{IEEEkeywords}
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment