Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
2
20240414IEEETG
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
XingguoChen
20240414IEEETG
Commits
0a7f9c76
Commit
0a7f9c76
authored
Jun 03, 2024
by
Lenovo
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
修改了摘要
parent
00d7d8d4
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
25 additions
and
12 deletions
+25
-12
document.tex
+25
-12
No files found.
document.tex
View file @
0a7f9c76
...
...
@@ -48,7 +48,7 @@ Intelligent Processing, Nanjing University of Posts and Telecommunications,
chenxg@njupt.edu.cn).
}
%
\IEEEcompsocitemizethanks
{
\IEEEcompsocthanksitem
W. Wang is
with College of Electronic Engineering, National University of Defense Technology,
with
the
College of Electronic Engineering, National University of Defense Technology,
P.R., China
\protect
(e-mail:
wangwenhao11@nudt.edu.cn).
\emph
{
(Corresponding author: W. Wang.)
}
%
...
...
@@ -67,17 +67,30 @@ wangwenhao11@nudt.edu.cn).
\maketitle
\begin{abstract}
In reinforcement learning of 2048 game,
we are intrigued by the absence of successful cases
involving explicit exploration, e.g.,
$
\epsilon
-
$
greedy,
softmax.
Through experiments comparing the 2048 game and maze,
we argue that explicit exploration strategies
cannot be effectively combined to learn in the 2048 game,
and demonstrate the acyclic nature of the 2048 game.
The successful experiences in the 2048 game AI
will contribute to solving acyclic MDPs and
MDPs with acyclic structures.
In the reinforcement learning of the 2048 game,
we observed that existing successful cases
do not explicitly utilize exploration strategies
such as
$
\epsilon
-
$
greedy and softmax.
Szubert and Ja
{
\'
s
}
kowski argued that
the intrinsic randomness of the 2048 game does
not necessitate the use of exploration strategies.
However, through experiments,
we found that incorporating the
$
\epsilon
-
$
greedy
exploration strategy into the 2048 game
leads to very poor learning outcomes.
This suggests that it's not that exploration
strategies are unnecessary, but rather that
they cannot be used effectively.
By combining near-optimal policies with an
$
\epsilon
-
$
greedy exploration strategy
and comparing the 2048 game with a maze game,
we discovered that in the maze game, the
$
\epsilon
-
$
greedy
exploration led to an
$
\epsilon
-
$
greedy policy,
whereas this was not the case for the 2048 game.
This led us to uncover a crucial property of 2048: its acyclic nature.
We proved that the 2048 game is acyclic between non-absorbing states.
This is the fundamental reason why explicit exploration cannot be
employed in the 2048 game.
\end{abstract}
\begin{IEEEkeywords}
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment