Commit ec700129 by Lenovo

acyclic

parent cded4336
\section{Non-ergodicity of the 2048 game} \section{Aacyclicity of the 2048 game}
The purpose of this section is to prove the non-ergodicity of the 2048 game The purpose of this section is to prove the acyclicity of the 2048 game
and give some discussions. and give some discussions.
\subsection{Non-ergodicity of the 2048 game} \subsection{acyclicity of the 2048 game}
The 2048 game consists of a 4$\times$4 grid board, totaling 16 squares. The 2048 game consists of a 4$\times$4 grid board, totaling 16 squares.
At the beginning of the game, two squares are randomly filled At the beginning of the game, two squares are randomly filled
...@@ -24,7 +24,7 @@ The game ends when all squares are filled, and no valid merge operations can be ...@@ -24,7 +24,7 @@ The game ends when all squares are filled, and no valid merge operations can be
\begin{theorem} \begin{theorem}
2048 game is non-ergodic between non-absorbing states. 2048 game is acyclic between non-absorbing states.
\end{theorem} \end{theorem}
\begin{IEEEproof} \begin{IEEEproof}
To apply Theorem \ref{judgmentTheorem}, what we need To apply Theorem \ref{judgmentTheorem}, what we need
...@@ -99,26 +99,6 @@ the claim follows by applying Theorem \ref{judgmentTheorem}. ...@@ -99,26 +99,6 @@ the claim follows by applying Theorem \ref{judgmentTheorem}.
\subsection{Discussions} \subsection{Discussions}
行为策略采样
$\langle s_t,a_t,r_{t+1},a_{t+1},s_{t+1} \rangle$,对应的特征
$\langle \phi_t,r_{t+1},\phi_{t+1} \rangle$
目标策略采样
$\langle s_t,a_t,r_{t+1},a',s_{t+1} \rangle$,对应的特征
$\langle \phi_t,r_{t+1},\phi' \rangle$
\begin{equation}
\theta_{t+1}=\theta_t+\alpha F_t (\rho_tR_t+\gamma \theta_t^{\top}\phi_t'-\theta_t^{\top}\phi_t-\mathbb{E}_{\pi}[\delta])\phi_t
\end{equation}
写的简单点是这样
\begin{equation}
\theta_{t+1}=\theta_t+\alpha F_t \rho_t(\delta_t-\mathbb{E}_{\mu}[\rho_t\delta_t])\phi_t,
\end{equation}
where
$\delta_t=R_t+\gamma \theta_t^{\top}\phi_{t+1}-\theta_t^{\top}\phi_t$
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment