大致算是证完了，后面就是非遍历性的讨论，以及是否需要加对期望最大搜索算法的改进了

ff9efea9 · Lenovo · 6111d3dd · ff9efea9 · ff9efea9 · ff9efea9
Commit ff9efea9 authored May 28, 2024 by Lenovo
Hide whitespace changes
Inline Side-by-side

Showing with 95 additions and 21 deletions

main/2048isNonergodic.tex
+65 -1

main/background.tex
+29 -20

main/nonergodic.tex
+1 -0

No files found.
--- a/main/2048isNonergodic.tex
+++ b/main/2048isNonergodic.tex
 \section{Non-ergodicity of 2048}


+The purpose of this section is to prove the non-ergodicity of the 2048 game.
+
 \begin{theorem}
 2048 game is non-ergodic between non-absorbing states.
 \end{theorem}
 \begin{IEEEproof}
-
+ To apply Theorem \ref{judgmentTheorem}, what we need 
+ to do is to assign a countable value to the 2048 game board 
+ and demonstrate the properties of the 
+ state transition probabilities in the 2048 game.
+
+
+In the 2048 game, each tile has 16 potential values,
+ including empty and $2^k$, $k\in\{1,2,3,\ldots,15\}$.
+Using 4 bits to represent a tile, the game board is a 4$\times$4 matrix 
+$B$. The corresponding tile is then computed as follows:
+\begin{equation}
+1\leq m\text{, }n \leq 4\text{, }tile_{m,n} =
+\begin{cases}
+0, & \text{if } B_{mn}=0; \\
+ 2^{B_{mn}}, & \text{otherwise.}  
+\end{cases}
+\label{equationTile}
+\end{equation}
+The sum of all tiles in the game board is
+\begin{equation}
+sum(B) = \sum_{m=1}^4\sum_{n=1}^4 tile_{mn}.
+\end{equation}
+A 64-bit long integer can uniquely represent any game board state.
+\begin{equation}
+long(B)= \sum_{m=1}^4\sum_{n=1}^416^{(m-1)*4+(n-1)}\cdot B_{mn}.
+\end{equation}
+We have 
+\begin{equation}
+long(B)<2^{64}.
+\label{size}
+\end{equation}
+The size of the board space $\mathcal{B}$ is
+$|\mathcal{B}|=2^{64}$.
+Define a utility function on board,
+\begin{equation}
+u(B) = 2^{64}\cdot sum(B)+long(B).
+\label{utility}
+\end{equation}
+It is easy to verify that 
+$\forall B_1, B_2\in \mathcal{B}$,
+if $B_1\neq B_2$, then $u(B_1)\neq u(B_2)$.
+For all possible board,
+ $\forall B\in \mathcal{B}$, calculate the utility value
+ $u(B) $, and sort $B$ by $u(B) $ in ascending order.
+ Let $I(B)$ be the index of the board $B$ after sorting,
+ we have
+ \begin{equation}
+ \forall B_1, B_2\in \mathcal{B}, u(B_1)<u(B_2) \iff
+ I(B_1)<I(B_2).
+ \label{basis}
+ \end{equation}
+For any transition $\langle B_1, a, B_1', B_2\rangle$ in the 2048 game,
+we have 
+$sum(B_1)=sum(B_1')$  regardless of whether at least two tiles merge.
+
+Due to a new generated 2-tile or 4-tile in board $B_2$,
+$sum(B_2)>sum(B_1')$, that is $sum(B_2)>sum(B_1)$.
+
+Based on (\ref{size}) and (\ref{utility}),
+we have  $u(B_2)>u(B_1)$.
+That means $I(B_2)>I(B_1)$.
+The transition probability between non-absorbing state satisifies (\ref{condition}),
+the claim follows by applying Theorem \ref{judgmentTheorem}.
 \end{IEEEproof}

 %\input{material/2048prove}

--- a/main/background.tex
+++ b/main/background.tex
 \section{Background}
-\subsection{2048 game rules}
+  
+\subsection{MDP and 2048 game}
+Consider Markov decision process (MDP)
+$\langle \mathcal{S}$, $\mathcal{A}$, $\mathcal{R}$, $\mathcal{T}$$\rangle$, where 
+$\mathcal{S}=\{1,2,3,\ldots\}$ is a finite state space, $|\mathcal{S}|=n$, $\mathcal{A}$ is an action space,
+$\mathcal{T}:\mathcal{S}\times \mathcal{A}\times \mathcal{S}\rightarrow [0,1]$ 
+is a transition function,
+$\mathcal{R}:\mathcal{S}\times \mathcal{A}\times \mathcal{S}\rightarrow \mathbb{R}$ is a reward function.
+Policy $\pi:S\times A\rightarrow [0,1]$ 
+selects an action $a$ in state $s$ 
+with probability $\pi(a|s)$.
+State value function under policy $\pi$, denoted $V^{\pi}:S\rightarrow
+\mathbb{R}$, represents the expected sum of rewards in
+the MDP under policy $\pi$:
+$V^{\pi}(s)=\mathbb{E}_{\pi}\left[\sum_{t=0}^{\infty}r_t|s_0=s\right]$.
+
+
 The 2048 game consists of a 4$\times$4 grid board, totaling 16 squares.
 At the beginning of the game,  two squares are randomly filled
  with tiles of either 2 or 4.
@@ -14,21 +30,9 @@ The 2048 game consists of a 4$\times$4 grid board, totaling 16 squares.
   Each tile can only participate in one merge operation per move.
   After each move, a new tile appears on a random empty square.
   The new tile is 2 with  probability 0.1, and 4 with probability 0.9.
-The game ends when all squares are filled, and no valid merge operations can be made.   
-\subsection{MDP}
-Consider Markov decision process (MDP)
-$\langle \mathcal{S}$, $\mathcal{A}$, $\mathcal{R}$, $\mathcal{T}$$\rangle$, where 
-$\mathcal{S}=\{1,2,3,\ldots\}$ is a finite state space, $|\mathcal{S}|=n$, $\mathcal{A}$ is an action space,
-$\mathcal{T}:\mathcal{S}\times \mathcal{A}\times \mathcal{S}\rightarrow [0,1]$ 
-is a transition function,
-$\mathcal{R}:\mathcal{S}\times \mathcal{A}\times \mathcal{S}\rightarrow \mathbb{R}$ is a reward function.
-Policy $\pi:S\times A\rightarrow [0,1]$ 
-selects an action $a$ in state $s$ 
-with probability $\pi(a|s)$.
-State value function under policy $\pi$, denoted $V^{\pi}:S\rightarrow
-\mathbb{R}$, represents the expected sum of rewards in
-the MDP under policy $\pi$:
-$V^{\pi}(s)=\mathbb{E}_{\pi}\left[\sum_{t=0}^{\infty}r_t|s_0=s\right]$.
+The game ends when all squares are filled, and no valid merge operations can be made. 
+
+\subsection{Ergodicity and Non-ergodicity of Markov Chains}

 Given a steady policy $\pi$, MDP becomes a Markov chain on state space
 $\mathcal{S}$ with a   matrix
@@ -47,9 +51,14 @@ That is $\forall s\in \mathcal{S}$, we have
 \sum_{s'\in \mathcal{S}}P_{\pi}(s',s)d_{\pi}(s')=d_{\pi}(s).
 \end{equation}

-Ergodicity assumption about the MDP assume that
-$d_{\pi}(s)$  exist for any policy $\pi$ and are independent of
-initial states \cite{Sutton2018book}.
+\begin{definition}[Ergodicity]
+ Assume $d_{\pi}(s)$  exist for any policy $\pi$ and 
+ are independent of initial states, 
+ MDP is ergodic if $\forall s\in \mathcal{S}$,
+$d_{\pi}(s)>0$.
+\end{definition}
+
+


 This mean all states are reachable under any policy from the
@@ -59,7 +68,7 @@ A sufficient condition for this assumption is that
 all other eigenvalues of $P_{\pi}$ are of modulus <1.


-\subsection{Ergodicity and Non-ergodicity of Markov Chains}
+

 \input{pic/randomWalk}


--- a/main/nonergodic.tex
+++ b/main/nonergodic.tex
@@ -90,6 +90,7 @@ is non-ergodic between non-absorbing states.
 By observing the truncated  St. Petersburg paradox, 
 it is easy to provide a sufficient condition for non-ergodicity between non-absorbing states.
 \begin{theorem}[A sufficient condition for non-ergodicity between non-absorbing states]
+\label{judgmentTheorem}
 Given a Markov chain with absorbing states, 
 suppose the size of the non-absorbing states $|S\setminus\{\text{T}\}|\geq 2$.
 If the transition matrix $Q$ between non-absorbing states satifies,