From ff9efea94cd95de1210db9d41bae55488d9aa82c Mon Sep 17 00:00:00 2001 From: Lenovo Date: Tue, 28 May 2024 00:00:56 +0800 Subject: [PATCH] 大致算是证完了,后面就是非遍历性的讨论,以及是否需要加对期望最大搜索算法的改进了 --- main/2048isNonergodic.tex | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- main/background.tex | 49 +++++++++++++++++++++++++++++-------------------- main/nonergodic.tex | 1 + 3 files changed, 95 insertions(+), 21 deletions(-) diff --git a/main/2048isNonergodic.tex b/main/2048isNonergodic.tex index 47e7179..a555247 100644 --- a/main/2048isNonergodic.tex +++ b/main/2048isNonergodic.tex @@ -1,11 +1,75 @@ \section{Non-ergodicity of 2048} +The purpose of this section is to prove the non-ergodicity of the 2048 game. + \begin{theorem} 2048 game is non-ergodic between non-absorbing states. \end{theorem} \begin{IEEEproof} - + To apply Theorem \ref{judgmentTheorem}, what we need + to do is to assign a countable value to the 2048 game board + and demonstrate the properties of the + state transition probabilities in the 2048 game. + + +In the 2048 game, each tile has 16 potential values, + including empty and $2^k$, $k\in\{1,2,3,\ldots,15\}$. +Using 4 bits to represent a tile, the game board is a 4$\times$4 matrix +$B$. The corresponding tile is then computed as follows: +\begin{equation} +1\leq m\text{, }n \leq 4\text{, }tile_{m,n} = +\begin{cases} +0, & \text{if } B_{mn}=0; \\ + 2^{B_{mn}}, & \text{otherwise.} +\end{cases} +\label{equationTile} +\end{equation} +The sum of all tiles in the game board is +\begin{equation} +sum(B) = \sum_{m=1}^4\sum_{n=1}^4 tile_{mn}. +\end{equation} +A 64-bit long integer can uniquely represent any game board state. +\begin{equation} +long(B)= \sum_{m=1}^4\sum_{n=1}^416^{(m-1)*4+(n-1)}\cdot B_{mn}. +\end{equation} +We have +\begin{equation} +long(B)<2^{64}. +\label{size} +\end{equation} +The size of the board space $\mathcal{B}$ is +$|\mathcal{B}|=2^{64}$. +Define a utility function on board, +\begin{equation} +u(B) = 2^{64}\cdot sum(B)+long(B). +\label{utility} +\end{equation} +It is easy to verify that +$\forall B_1, B_2\in \mathcal{B}$, +if $B_1\neq B_2$, then $u(B_1)\neq u(B_2)$. +For all possible board, + $\forall B\in \mathcal{B}$, calculate the utility value + $u(B) $, and sort $B$ by $u(B) $ in ascending order. + Let $I(B)$ be the index of the board $B$ after sorting, + we have + \begin{equation} + \forall B_1, B_2\in \mathcal{B}, u(B_1)sum(B_1')$, that is $sum(B_2)>sum(B_1)$. + +Based on (\ref{size}) and (\ref{utility}), +we have $u(B_2)>u(B_1)$. +That means $I(B_2)>I(B_1)$. +The transition probability between non-absorbing state satisifies (\ref{condition}), +the claim follows by applying Theorem \ref{judgmentTheorem}. \end{IEEEproof} %\input{material/2048prove} diff --git a/main/background.tex b/main/background.tex index 7e7e6f1..37527c4 100644 --- a/main/background.tex +++ b/main/background.tex @@ -1,5 +1,21 @@ \section{Background} -\subsection{2048 game rules} + +\subsection{MDP and 2048 game} +Consider Markov decision process (MDP) +$\langle \mathcal{S}$, $\mathcal{A}$, $\mathcal{R}$, $\mathcal{T}$$\rangle$, where +$\mathcal{S}=\{1,2,3,\ldots\}$ is a finite state space, $|\mathcal{S}|=n$, $\mathcal{A}$ is an action space, +$\mathcal{T}:\mathcal{S}\times \mathcal{A}\times \mathcal{S}\rightarrow [0,1]$ +is a transition function, +$\mathcal{R}:\mathcal{S}\times \mathcal{A}\times \mathcal{S}\rightarrow \mathbb{R}$ is a reward function. +Policy $\pi:S\times A\rightarrow [0,1]$ +selects an action $a$ in state $s$ +with probability $\pi(a|s)$. +State value function under policy $\pi$, denoted $V^{\pi}:S\rightarrow +\mathbb{R}$, represents the expected sum of rewards in +the MDP under policy $\pi$: +$V^{\pi}(s)=\mathbb{E}_{\pi}\left[\sum_{t=0}^{\infty}r_t|s_0=s\right]$. + + The 2048 game consists of a 4$\times$4 grid board, totaling 16 squares. At the beginning of the game, two squares are randomly filled with tiles of either 2 or 4. @@ -14,21 +30,9 @@ The 2048 game consists of a 4$\times$4 grid board, totaling 16 squares. Each tile can only participate in one merge operation per move. After each move, a new tile appears on a random empty square. The new tile is 2 with probability 0.1, and 4 with probability 0.9. -The game ends when all squares are filled, and no valid merge operations can be made. -\subsection{MDP} -Consider Markov decision process (MDP) -$\langle \mathcal{S}$, $\mathcal{A}$, $\mathcal{R}$, $\mathcal{T}$$\rangle$, where -$\mathcal{S}=\{1,2,3,\ldots\}$ is a finite state space, $|\mathcal{S}|=n$, $\mathcal{A}$ is an action space, -$\mathcal{T}:\mathcal{S}\times \mathcal{A}\times \mathcal{S}\rightarrow [0,1]$ -is a transition function, -$\mathcal{R}:\mathcal{S}\times \mathcal{A}\times \mathcal{S}\rightarrow \mathbb{R}$ is a reward function. -Policy $\pi:S\times A\rightarrow [0,1]$ -selects an action $a$ in state $s$ -with probability $\pi(a|s)$. -State value function under policy $\pi$, denoted $V^{\pi}:S\rightarrow -\mathbb{R}$, represents the expected sum of rewards in -the MDP under policy $\pi$: -$V^{\pi}(s)=\mathbb{E}_{\pi}\left[\sum_{t=0}^{\infty}r_t|s_0=s\right]$. +The game ends when all squares are filled, and no valid merge operations can be made. + +\subsection{Ergodicity and Non-ergodicity of Markov Chains} Given a steady policy $\pi$, MDP becomes a Markov chain on state space $\mathcal{S}$ with a matrix @@ -47,9 +51,14 @@ That is $\forall s\in \mathcal{S}$, we have \sum_{s'\in \mathcal{S}}P_{\pi}(s',s)d_{\pi}(s')=d_{\pi}(s). \end{equation} -Ergodicity assumption about the MDP assume that -$d_{\pi}(s)$ exist for any policy $\pi$ and are independent of -initial states \cite{Sutton2018book}. +\begin{definition}[Ergodicity] + Assume $d_{\pi}(s)$ exist for any policy $\pi$ and + are independent of initial states, + MDP is ergodic if $\forall s\in \mathcal{S}$, +$d_{\pi}(s)>0$. +\end{definition} + + This mean all states are reachable under any policy from the @@ -59,7 +68,7 @@ A sufficient condition for this assumption is that all other eigenvalues of $P_{\pi}$ are of modulus <1. -\subsection{Ergodicity and Non-ergodicity of Markov Chains} + \input{pic/randomWalk} diff --git a/main/nonergodic.tex b/main/nonergodic.tex index 2a5f786..f54779f 100644 --- a/main/nonergodic.tex +++ b/main/nonergodic.tex @@ -90,6 +90,7 @@ is non-ergodic between non-absorbing states. By observing the truncated St. Petersburg paradox, it is easy to provide a sufficient condition for non-ergodicity between non-absorbing states. \begin{theorem}[A sufficient condition for non-ergodicity between non-absorbing states] +\label{judgmentTheorem} Given a Markov chain with absorbing states, suppose the size of the non-absorbing states $|S\setminus\{\text{T}\}|\geq 2$. If the transition matrix $Q$ between non-absorbing states satifies, -- libgit2 0.26.0