Commit ff9efea9 by Lenovo

大致算是证完了,后面就是非遍历性的讨论,以及是否需要加对期望最大搜索算法的改进了

parent 6111d3dd
\section{Non-ergodicity of 2048}
The purpose of this section is to prove the non-ergodicity of the 2048 game.
\begin{theorem}
2048 game is non-ergodic between non-absorbing states.
\end{theorem}
\begin{IEEEproof}
To apply Theorem \ref{judgmentTheorem}, what we need
to do is to assign a countable value to the 2048 game board
and demonstrate the properties of the
state transition probabilities in the 2048 game.
In the 2048 game, each tile has 16 potential values,
including empty and $2^k$, $k\in\{1,2,3,\ldots,15\}$.
Using 4 bits to represent a tile, the game board is a 4$\times$4 matrix
$B$. The corresponding tile is then computed as follows:
\begin{equation}
1\leq m\text{, }n \leq 4\text{, }tile_{m,n} =
\begin{cases}
0, & \text{if } B_{mn}=0; \\
2^{B_{mn}}, & \text{otherwise.}
\end{cases}
\label{equationTile}
\end{equation}
The sum of all tiles in the game board is
\begin{equation}
sum(B) = \sum_{m=1}^4\sum_{n=1}^4 tile_{mn}.
\end{equation}
A 64-bit long integer can uniquely represent any game board state.
\begin{equation}
long(B)= \sum_{m=1}^4\sum_{n=1}^416^{(m-1)*4+(n-1)}\cdot B_{mn}.
\end{equation}
We have
\begin{equation}
long(B)<2^{64}.
\label{size}
\end{equation}
The size of the board space $\mathcal{B}$ is
$|\mathcal{B}|=2^{64}$.
Define a utility function on board,
\begin{equation}
u(B) = 2^{64}\cdot sum(B)+long(B).
\label{utility}
\end{equation}
It is easy to verify that
$\forall B_1, B_2\in \mathcal{B}$,
if $B_1\neq B_2$, then $u(B_1)\neq u(B_2)$.
For all possible board,
$\forall B\in \mathcal{B}$, calculate the utility value
$u(B) $, and sort $B$ by $u(B) $ in ascending order.
Let $I(B)$ be the index of the board $B$ after sorting,
we have
\begin{equation}
\forall B_1, B_2\in \mathcal{B}, u(B_1)<u(B_2) \iff
I(B_1)<I(B_2).
\label{basis}
\end{equation}
For any transition $\langle B_1, a, B_1', B_2\rangle$ in the 2048 game,
we have
$sum(B_1)=sum(B_1')$ regardless of whether at least two tiles merge.
Due to a new generated 2-tile or 4-tile in board $B_2$,
$sum(B_2)>sum(B_1')$, that is $sum(B_2)>sum(B_1)$.
Based on (\ref{size}) and (\ref{utility}),
we have $u(B_2)>u(B_1)$.
That means $I(B_2)>I(B_1)$.
The transition probability between non-absorbing state satisifies (\ref{condition}),
the claim follows by applying Theorem \ref{judgmentTheorem}.
\end{IEEEproof}
%\input{material/2048prove}
......
\section{Background}
\subsection{2048 game rules}
\subsection{MDP and 2048 game}
Consider Markov decision process (MDP)
$\langle \mathcal{S}$, $\mathcal{A}$, $\mathcal{R}$, $\mathcal{T}$$\rangle$, where
$\mathcal{S}=\{1,2,3,\ldots\}$ is a finite state space, $|\mathcal{S}|=n$, $\mathcal{A}$ is an action space,
$\mathcal{T}:\mathcal{S}\times \mathcal{A}\times \mathcal{S}\rightarrow [0,1]$
is a transition function,
$\mathcal{R}:\mathcal{S}\times \mathcal{A}\times \mathcal{S}\rightarrow \mathbb{R}$ is a reward function.
Policy $\pi:S\times A\rightarrow [0,1]$
selects an action $a$ in state $s$
with probability $\pi(a|s)$.
State value function under policy $\pi$, denoted $V^{\pi}:S\rightarrow
\mathbb{R}$, represents the expected sum of rewards in
the MDP under policy $\pi$:
$V^{\pi}(s)=\mathbb{E}_{\pi}\left[\sum_{t=0}^{\infty}r_t|s_0=s\right]$.
The 2048 game consists of a 4$\times$4 grid board, totaling 16 squares.
At the beginning of the game, two squares are randomly filled
with tiles of either 2 or 4.
......@@ -14,21 +30,9 @@ The 2048 game consists of a 4$\times$4 grid board, totaling 16 squares.
Each tile can only participate in one merge operation per move.
After each move, a new tile appears on a random empty square.
The new tile is 2 with probability 0.1, and 4 with probability 0.9.
The game ends when all squares are filled, and no valid merge operations can be made.
\subsection{MDP}
Consider Markov decision process (MDP)
$\langle \mathcal{S}$, $\mathcal{A}$, $\mathcal{R}$, $\mathcal{T}$$\rangle$, where
$\mathcal{S}=\{1,2,3,\ldots\}$ is a finite state space, $|\mathcal{S}|=n$, $\mathcal{A}$ is an action space,
$\mathcal{T}:\mathcal{S}\times \mathcal{A}\times \mathcal{S}\rightarrow [0,1]$
is a transition function,
$\mathcal{R}:\mathcal{S}\times \mathcal{A}\times \mathcal{S}\rightarrow \mathbb{R}$ is a reward function.
Policy $\pi:S\times A\rightarrow [0,1]$
selects an action $a$ in state $s$
with probability $\pi(a|s)$.
State value function under policy $\pi$, denoted $V^{\pi}:S\rightarrow
\mathbb{R}$, represents the expected sum of rewards in
the MDP under policy $\pi$:
$V^{\pi}(s)=\mathbb{E}_{\pi}\left[\sum_{t=0}^{\infty}r_t|s_0=s\right]$.
The game ends when all squares are filled, and no valid merge operations can be made.
\subsection{Ergodicity and Non-ergodicity of Markov Chains}
Given a steady policy $\pi$, MDP becomes a Markov chain on state space
$\mathcal{S}$ with a matrix
......@@ -47,9 +51,14 @@ That is $\forall s\in \mathcal{S}$, we have
\sum_{s'\in \mathcal{S}}P_{\pi}(s',s)d_{\pi}(s')=d_{\pi}(s).
\end{equation}
Ergodicity assumption about the MDP assume that
$d_{\pi}(s)$ exist for any policy $\pi$ and are independent of
initial states \cite{Sutton2018book}.
\begin{definition}[Ergodicity]
Assume $d_{\pi}(s)$ exist for any policy $\pi$ and
are independent of initial states,
MDP is ergodic if $\forall s\in \mathcal{S}$,
$d_{\pi}(s)>0$.
\end{definition}
This mean all states are reachable under any policy from the
......@@ -59,7 +68,7 @@ A sufficient condition for this assumption is that
all other eigenvalues of $P_{\pi}$ are of modulus <1.
\subsection{Ergodicity and Non-ergodicity of Markov Chains}
\input{pic/randomWalk}
......
......@@ -90,6 +90,7 @@ is non-ergodic between non-absorbing states.
By observing the truncated St. Petersburg paradox,
it is easy to provide a sufficient condition for non-ergodicity between non-absorbing states.
\begin{theorem}[A sufficient condition for non-ergodicity between non-absorbing states]
\label{judgmentTheorem}
Given a Markov chain with absorbing states,
suppose the size of the non-absorbing states $|S\setminus\{\text{T}\}|\geq 2$.
If the transition matrix $Q$ between non-absorbing states satifies,
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment