\section{Background} Consider Markov decision process (MDP) $\langle \mathcal{S}$, $\mathcal{A}$, $\mathcal{R}$, $\mathcal{T}$$\rangle$, where $\mathcal{S}=\{1,2,3,\ldots\}$ is a finite state space, $|\mathcal{S}|=n$, $\mathcal{A}$ is an action space, $\mathcal{T}:\mathcal{S}\times \mathcal{A}\times \mathcal{S}\rightarrow [0,1]$ is a transition function, $\mathcal{R}:\mathcal{S}\times \mathcal{A}\times \mathcal{S}\rightarrow \mathbb{R}$ is a reward function. Policy $\pi:S\times A\rightarrow [0,1]$ selects an action $a$ in state $s$ with probability $\pi(a|s)$. State value function under policy $\pi$, denoted $V^{\pi}:S\rightarrow \mathbb{R}$, represents the expected sum of rewards in the MDP under policy $\pi$: $V^{\pi}(s)=\mathbb{E}_{\pi}\left[\sum_{t=0}^{\infty}r_t|s_0=s\right]$. Given a steady policy $\pi$, MDP becomes a Markov chain on state space $\mathcal{S}$ with a matrix $P^{\pi}\in[0,1]^{n\times n}$, where $P^{\pi}(s_1,s_2)=\sum_{a\in \mathcal{A}}\pi(a|s_1)\mathcal{T}(s_1,a,s_2)$ is the transition probobility from $s_1$ to $s_2$, $\forall s\in \mathcal{S}$, $\sum_{s'\in \mathcal{S}}P^{\pi}(s,s')=1$. A stationary measure for $P$ is a distribution measure $d$ on $\mathcal{S}$ such that \begin{equation} d^{\top}=d^{\top}P^{\pi}. \end{equation} That is $\forall s\in \mathcal{S}$, we have \begin{equation} \sum_{s'\in \mathcal{S}}P^{\pi}(s',s)d(s')=d(s). \end{equation} 给出Markov Chain的遍历性定义,和充分条件。 根据随机游走例子说明 带有Absorbing state的是不满足遍历性的, 带有重启的强化学习训练设定是满足遍历性的。 本文关注的是去除吸收态时,非吸收态之间的遍历性。 通过圣彼得堡例子说明,圣彼得堡不满足非吸收态之间的遍历性。 给出定理,同样证明2048游戏不满足非吸收态之间的遍历性。