\section{Background}

Consider Markov decision process (MDP)
$\langle \mathcal{S}$, $\mathcal{A}$, $\mathcal{R}$, $\mathcal{T}$$\rangle$, where 
$\mathcal{S}=\{1,2,3,\ldots\}$ is a finite state space, $|\mathcal{S}|=n$, $\mathcal{A}$ is an action space,
$\mathcal{T}:\mathcal{S}\times \mathcal{A}\times \mathcal{S}\rightarrow [0,1]$ 
is a transition function,
$\mathcal{R}:\mathcal{S}\times \mathcal{A}\times \mathcal{S}\rightarrow \mathbb{R}$ is a reward function.
Policy $\pi:S\times A\rightarrow [0,1]$ 
selects an action $a$ in state $s$ 
with probability $\pi(a|s)$.
State value function under policy $\pi$, denoted $V^{\pi}:S\rightarrow
\mathbb{R}$, represents the expected sum of rewards in
the MDP under policy $\pi$:
$V^{\pi}(s)=\mathbb{E}_{\pi}\left[\sum_{t=0}^{\infty}r_t|s_0=s\right]$.

Given a steady policy $\pi$, MDP becomes a Markov chain on state space
$\mathcal{S}$ with a   matrix
$P^{\pi}\in[0,1]^{n\times n}$, where
$P^{\pi}(s_1,s_2)=\sum_{a\in \mathcal{A}}\pi(a|s_1)\mathcal{T}(s_1,a,s_2)$
is the transition probobility from $s_1$ to $s_2$, 
$\forall s\in \mathcal{S}$,   $\sum_{s'\in \mathcal{S}}P^{\pi}(s,s')=1$. 
A stationary  measure for $P$ is a distribution measure
$d$ on $\mathcal{S}$ such that
\begin{equation}
d^{\top}=d^{\top}P^{\pi}.
\end{equation}
That is $\forall s\in \mathcal{S}$, we have
\begin{equation}
\sum_{s'\in \mathcal{S}}P^{\pi}(s',s)d(s')=d(s).
\end{equation}

给出Markov Chain的遍历性定义，和充分条件。
根据随机游走例子说明 带有Absorbing state的是不满足遍历性的，
带有重启的强化学习训练设定是满足遍历性的。

本文关注的是去除吸收态时，非吸收态之间的遍历性。
通过圣彼得堡例子说明，圣彼得堡不满足非吸收态之间的遍历性。
给出定理，同样证明2048游戏不满足非吸收态之间的遍历性。