增加一点讨论

2e608808 · Lenovo · 8b8222aa · 2e608808
Commit 2e608808 authored Jun 02, 2024 by Lenovo
Hide whitespace changes
Inline Side-by-side

Showing with 4 additions and 3 deletions

main/discussion.tex
+4 -3

No files found.
--- a/main/discussion.tex
+++ b/main/discussion.tex
@@ -5,7 +5,7 @@ When an agent deviates from the optimal
 path during exploration, it can quickly adjust back.
 Therefore, in a maze,  a policy 
 combined with an $\epsilon$-greedy exploration  
-remains $\epsilon$-greedy  policy.
+remains an $\epsilon$-greedy  policy.
 
 However, 2048 game is acyclic between non-absorbing states.
 Any choice error caused by exploration will persist until the end of the game.
@@ -34,11 +34,12 @@ board games: (1) ``It has a long sequence of moves'';
 Then, he applied backward learning and restart to improve
 learning.

-We declare that the acyclic nature of
+We further declare that the acyclic nature of
 game 2048 leads to the efficient performance of backward learning.


-Finally, MDPs with acyclic structures can benefit from 
+Finally, acyclic MDPs (e.g. games like
+Connect-Four) and MDPs with acyclic structures can benefit from 
 the algorithmic insights that have led to the success of the 2048 AI.