From 2e60880827189a6c6b410fd460bd21d663098013 Mon Sep 17 00:00:00 2001
From: Lenovo <Lenovo@windows10.microdone.cn>
Date: Sun, 2 Jun 2024 11:42:09 +0800
Subject: [PATCH] 增加一点讨论

---
 main/discussion.tex | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/main/discussion.tex b/main/discussion.tex
index 4a6628d..06e4821 100644
--- a/main/discussion.tex
+++ b/main/discussion.tex
@@ -5,7 +5,7 @@ When an agent deviates from the optimal
  path during exploration, it can quickly adjust back.
 Therefore, in a maze,  a policy 
 combined with an $\epsilon$-greedy exploration  
-remains $\epsilon$-greedy  policy.
+remains an $\epsilon$-greedy  policy.
  
  However, 2048 game is acyclic between non-absorbing states.
  Any choice error caused by exploration will persist until the end of the game.
@@ -34,11 +34,12 @@ board games: (1) ``It has a long sequence of moves'';
 Then, he applied backward learning and restart to improve
 learning.
 
-We declare that the acyclic nature of
+We further declare that the acyclic nature of
 game 2048 leads to the efficient performance of backward learning.
 
 
-Finally, MDPs with acyclic structures can benefit from 
+Finally, acyclic MDPs (e.g. games like
+Connect-Four) and MDPs with acyclic structures can benefit from 
 the algorithmic insights that have led to the success of the 2048 AI.
 
 
--
libgit2 0.26.0