Commit 0a7f9c76 by Lenovo

修改了摘要

parent 00d7d8d4
...@@ -48,7 +48,7 @@ Intelligent Processing, Nanjing University of Posts and Telecommunications, ...@@ -48,7 +48,7 @@ Intelligent Processing, Nanjing University of Posts and Telecommunications,
chenxg@njupt.edu.cn).}% chenxg@njupt.edu.cn).}%
\IEEEcompsocitemizethanks{ \IEEEcompsocitemizethanks{
\IEEEcompsocthanksitem W. Wang is \IEEEcompsocthanksitem W. Wang is
with College of Electronic Engineering, National University of Defense Technology, with the College of Electronic Engineering, National University of Defense Technology,
P.R., China \protect (e-mail: P.R., China \protect (e-mail:
wangwenhao11@nudt.edu.cn). wangwenhao11@nudt.edu.cn).
\emph{(Corresponding author: W. Wang.)}% \emph{(Corresponding author: W. Wang.)}%
...@@ -67,17 +67,30 @@ wangwenhao11@nudt.edu.cn). ...@@ -67,17 +67,30 @@ wangwenhao11@nudt.edu.cn).
\maketitle \maketitle
\begin{abstract} \begin{abstract}
In reinforcement learning of 2048 game, In the reinforcement learning of the 2048 game,
we are intrigued by the absence of successful cases we observed that existing successful cases
involving explicit exploration, e.g., $\epsilon-$greedy, do not explicitly utilize exploration strategies
softmax. such as $\epsilon-$greedy and softmax.
Through experiments comparing the 2048 game and maze, Szubert and Ja{\'s}kowski argued that
we argue that explicit exploration strategies the intrinsic randomness of the 2048 game does
cannot be effectively combined to learn in the 2048 game, not necessitate the use of exploration strategies.
and demonstrate the acyclic nature of the 2048 game. However, through experiments,
The successful experiences in the 2048 game AI we found that incorporating the $\epsilon-$greedy
will contribute to solving acyclic MDPs and exploration strategy into the 2048 game
MDPs with acyclic structures. leads to very poor learning outcomes.
This suggests that it's not that exploration
strategies are unnecessary, but rather that
they cannot be used effectively.
By combining near-optimal policies with an
$\epsilon-$greedy exploration strategy
and comparing the 2048 game with a maze game,
we discovered that in the maze game, the $\epsilon-$greedy
exploration led to an $\epsilon-$greedy policy,
whereas this was not the case for the 2048 game.
This led us to uncover a crucial property of 2048: its acyclic nature.
We proved that the 2048 game is acyclic between non-absorbing states.
This is the fundamental reason why explicit exploration cannot be
employed in the 2048 game.
\end{abstract} \end{abstract}
\begin{IEEEkeywords} \begin{IEEEkeywords}
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment