Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
2
20240414IEEETG
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
XingguoChen
20240414IEEETG
Commits
2e608808
Commit
2e608808
authored
Jun 02, 2024
by
Lenovo
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
增加一点讨论
parent
8b8222aa
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
4 additions
and
3 deletions
+4
-3
main/discussion.tex
+4
-3
No files found.
main/discussion.tex
View file @
2e608808
...
@@ -5,7 +5,7 @@ When an agent deviates from the optimal
...
@@ -5,7 +5,7 @@ When an agent deviates from the optimal
path during exploration, it can quickly adjust back.
path during exploration, it can quickly adjust back.
Therefore, in a maze, a policy
Therefore, in a maze, a policy
combined with an
$
\epsilon
$
-greedy exploration
combined with an
$
\epsilon
$
-greedy exploration
remains
$
\epsilon
$
-greedy policy.
remains
an
$
\epsilon
$
-greedy policy.
However, 2048 game is acyclic between non-absorbing states.
However, 2048 game is acyclic between non-absorbing states.
Any choice error caused by exploration will persist until the end of the game.
Any choice error caused by exploration will persist until the end of the game.
...
@@ -34,11 +34,12 @@ board games: (1) ``It has a long sequence of moves'';
...
@@ -34,11 +34,12 @@ board games: (1) ``It has a long sequence of moves'';
Then, he applied backward learning and restart to improve
Then, he applied backward learning and restart to improve
learning.
learning.
We declare that the acyclic nature of
We
further
declare that the acyclic nature of
game 2048 leads to the efficient performance of backward learning.
game 2048 leads to the efficient performance of backward learning.
Finally, MDPs with acyclic structures can benefit from
Finally, acyclic MDPs (e.g. games like
Connect-Four) and MDPs with acyclic structures can benefit from
the algorithmic insights that have led to the success of the 2048 AI.
the algorithmic insights that have led to the success of the 2048 AI.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment