Commit 2e608808 by Lenovo

增加一点讨论

parent 8b8222aa
......@@ -5,7 +5,7 @@ When an agent deviates from the optimal
path during exploration, it can quickly adjust back.
Therefore, in a maze, a policy
combined with an $\epsilon$-greedy exploration
remains $\epsilon$-greedy policy.
remains an $\epsilon$-greedy policy.
However, 2048 game is acyclic between non-absorbing states.
Any choice error caused by exploration will persist until the end of the game.
......@@ -34,11 +34,12 @@ board games: (1) ``It has a long sequence of moves'';
Then, he applied backward learning and restart to improve
learning.
We declare that the acyclic nature of
We further declare that the acyclic nature of
game 2048 leads to the efficient performance of backward learning.
Finally, MDPs with acyclic structures can benefit from
Finally, acyclic MDPs (e.g. games like
Connect-Four) and MDPs with acyclic structures can benefit from
the algorithmic insights that have led to the success of the 2048 AI.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment