diff --git a/main/2048isAcyclic.tex b/main/2048isAcyclic.tex index 6bc127f..94458ce 100644 --- a/main/2048isAcyclic.tex +++ b/main/2048isAcyclic.tex @@ -1,10 +1,10 @@ -\section{Non-ergodicity of the 2048 game} +\section{Aacyclicity of the 2048 game} -The purpose of this section is to prove the non-ergodicity of the 2048 game +The purpose of this section is to prove the acyclicity of the 2048 game and give some discussions. -\subsection{Non-ergodicity of the 2048 game} +\subsection{acyclicity of the 2048 game} The 2048 game consists of a 4$\times$4 grid board, totaling 16 squares. At the beginning of the game, two squares are randomly filled @@ -24,7 +24,7 @@ The game ends when all squares are filled, and no valid merge operations can be \begin{theorem} -2048 game is non-ergodic between non-absorbing states. +2048 game is acyclic between non-absorbing states. \end{theorem} \begin{IEEEproof} To apply Theorem \ref{judgmentTheorem}, what we need @@ -99,26 +99,6 @@ the claim follows by applying Theorem \ref{judgmentTheorem}. \subsection{Discussions} -行为策略采样 -$\langle s_t,a_t,r_{t+1},a_{t+1},s_{t+1} \rangle$,对应的特征 -$\langle \phi_t,r_{t+1},\phi_{t+1} \rangle$ - -目标策略采样 -$\langle s_t,a_t,r_{t+1},a',s_{t+1} \rangle$,对应的特征 -$\langle \phi_t,r_{t+1},\phi' \rangle$ - -\begin{equation} -\theta_{t+1}=\theta_t+\alpha F_t (\rho_tR_t+\gamma \theta_t^{\top}\phi_t'-\theta_t^{\top}\phi_t-\mathbb{E}_{\pi}[\delta])\phi_t -\end{equation} -写的简单点是这样 - -\begin{equation} -\theta_{t+1}=\theta_t+\alpha F_t \rho_t(\delta_t-\mathbb{E}_{\mu}[\rho_t\delta_t])\phi_t, -\end{equation} -where -$\delta_t=R_t+\gamma \theta_t^{\top}\phi_{t+1}-\theta_t^{\top}\phi_t$ - -