From 2e60880827189a6c6b410fd460bd21d663098013 Mon Sep 17 00:00:00 2001 From: Lenovo Date: Sun, 2 Jun 2024 11:42:09 +0800 Subject: [PATCH] 增加一点讨论 --- main/discussion.tex | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/main/discussion.tex b/main/discussion.tex index 4a6628d..06e4821 100644 --- a/main/discussion.tex +++ b/main/discussion.tex @@ -5,7 +5,7 @@ When an agent deviates from the optimal path during exploration, it can quickly adjust back. Therefore, in a maze, a policy combined with an $\epsilon$-greedy exploration -remains $\epsilon$-greedy policy. +remains an $\epsilon$-greedy policy. However, 2048 game is acyclic between non-absorbing states. Any choice error caused by exploration will persist until the end of the game. @@ -34,11 +34,12 @@ board games: (1) ``It has a long sequence of moves''; Then, he applied backward learning and restart to improve learning. -We declare that the acyclic nature of +We further declare that the acyclic nature of game 2048 leads to the efficient performance of backward learning. -Finally, MDPs with acyclic structures can benefit from +Finally, acyclic MDPs (e.g. games like +Connect-Four) and MDPs with acyclic structures can benefit from the algorithmic insights that have led to the success of the 2048 AI. -- libgit2 0.26.0