From 76a6eacc5879ddc1c26e2063e678212961c5b665 Mon Sep 17 00:00:00 2001 From: Lenovo Date: Sun, 26 May 2024 07:16:49 +0800 Subject: [PATCH] 圣彼得堡的非遍历性证明好了 --- document.tex | 1 + main/2048prove.tex | 54 ------------------------------------------------------ main/background.tex | 71 ++++++++++++++++++++++++++++++----------------------------------------- main/introduction.tex | 2 +- main/nonergodic.tex | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ main/nonergodicity.tex | 110 -------------------------------------------------------------------------------------------------------------- main/paradox.tex | 120 ------------------------------------------------------------------------------------------------------------------------ main/theorem.tex | 109 ------------------------------------------------------------------------------------------------------------- material/2048prove.tex | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ material/nonergodicity.tex | 110 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ material/paradox.tex | 120 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ material/theorem.tex | 109 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ pic/paradox.tex | 31 +++++++++++++++++++++++++++++++ 13 files changed, 552 insertions(+), 435 deletions(-) delete mode 100644 main/2048prove.tex create mode 100644 main/nonergodic.tex delete mode 100644 main/nonergodicity.tex delete mode 100644 main/paradox.tex delete mode 100644 main/theorem.tex create mode 100644 material/2048prove.tex create mode 100644 material/nonergodicity.tex create mode 100644 material/paradox.tex create mode 100644 material/theorem.tex create mode 100644 pic/paradox.tex diff --git a/document.tex b/document.tex index ac8548c..739e89b 100644 --- a/document.tex +++ b/document.tex @@ -74,6 +74,7 @@ wangwenhao11@nudt.edu.cn). \input{main/introduction} \input{main/background} +\input{main/nonergodic} %\input{main/nonergodicity} %\input{main/paradox} diff --git a/main/2048prove.tex b/main/2048prove.tex deleted file mode 100644 index 9039eb3..0000000 --- a/main/2048prove.tex +++ /dev/null @@ -1,54 +0,0 @@ -\section{2048游戏的非遍历性证明} -\subsection{2048游戏编码规则} -为了完成从游戏到马尔可夫决策过程的转化,首先需要对局面进行排序, -需要给局面一一对应一个可以进行比较的值,通过这个值对局面进行排序。 -需要保证的是,局面和大的排序靠后,如果局面和一样,则按照局面编码大小排序 -2048的游戏棋盘是$4×4$的,每一个格子上都可以是${空格,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192,16384,32768}$ -这些数字,为了便于计算机内的保存本文将其一一对应为{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}, -因为在游戏中不存在面值为2^0=1的方块,于是在这里使用0来特别地对应上原先格子中空格的情况。这个游戏的状态是有限的, -有不超过16^16=2^64个状态,对于每一个棋盘局面本文可以执行 “上”,“下”,“左”,“右”这四个动作。 -执行动作之后将会把方块往动作方向移动,如果有两个相同幂次的方块碰撞会合并成为一个幂次加一的方块, -并且在一个空格位置随机生成一个2或者4的方块。本文将这个棋盘用一个1×16的数组B进行表示, -其中B中存放的是方块的幂次,空格用0表示,m表示数组下标。 -$B_m∈{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15},0≤m≤15$。编码规则如下: - -\begin{equation} -% 2048游戏局面编码 -p=2^{64} \cdot \sum_{m=0}^{15} I(B_m \neq 0) \cdot 2^{B_m} + \sum_{m=0}^{15} (1 \ll 4m) \cdot B_m -\end{equation} - -其中$\mathbb{I}(B_m≠0)$是指示函数,当B_m的值不为0的时候这个函数返回1,也就是说不统计棋盘中的空格格子,这个编码的含义是将棋盘映射成一个长整型的变量, -本文将这个结果放在比64bit更高的位置上,也就是 64-84bit的位置。这个编码的主要含义是,将局面所有数字之和放在高bit位置上,排序时局面之和大的排在后面, -状态转移时就是从小的下标转移到大的下标上。另外后面64bit就是局面的编码,来保证这个值的唯一性,一个局面会对应一个唯一的值。 - -\input{pic/2048encode} - -上面的图中的这个局面的编码$p=(1≪64)∙30784+0x FEDC 5432 0000 0020$。 -本文是按照从下往上,从右往左的顺序给格子进行排列,右下角的格子是最低位,左上角的格子是最高位。 -由此本文获得了一个有关状态的大小关系,还可以了解,对于两个不同的局面,通过这个排序可以获取每一个状态的排列后相应的下标。 -特别的,本文将所有的死亡状态从序列中抽出放在状态转移的最大下标位置。 - -\subsection{2048游戏非终结状态的非遍历性证明} -首先能够很快得到2048游戏是不满足遍历性的这一结论,因为2048游戏本身具有众多的吸收态,因此根据定理3.2一定是不满足遍历性的。但是我们此处考虑非终结状态之间的转移关系的遍历性。 -推论: 2048游戏非终结状态转移矩阵是非遍历性的。 -证明:首先记2048游戏的马尔可夫决策过程在策略为π的情况下的状态转移矩阵为P_π,状态先后关系通过上面的编码方式确定,因为有吸收态存在于是可以将这个状态转移矩阵写成标准矩阵的形式: -\begin{equation} -% 带策略的马尔可夫链标准形式 -[ P_\pi = \begin{pmatrix} Q_\pi & R_\pi \ 0 & I \end{pmatrix} ] -\end{equation} - -根据游戏规则,两个相同幂次的方块碰撞会合并成为一个幂次加一的方块, -然后会在一个空格位置随机生成一个2或者4的方块,这一过程本文记为$S_i\to S_(i^')\to S_j$。 -\input{pic/2048example-p} - -如图3.5所示根据我们的规则可以保证,状态在后的排序也靠后。也就是说在$S_i\to S_j$的过程中,能够保证$p_ip_i ;p_j>p_{i^{,}}$。 -通过这种转移关系我们可以认定,不存在向前转移的情况,因此,与圣彼得堡悖论类似,2048游戏的n步转移的$Q_π^{n}$矩阵是一个上三角矩阵。 - -因此根据本文的编码2048游戏的状态转移过程一直是满足从小的下标转移到大的下标上这一情况。 -实际上任何不会向之前状态转移的过程都满足这个条件。在本文设计的状态转移下,状态对应下标只增不减,$q_{ij}>0$在$j-i>0$的条件下, -$j≤i$位置$q_{ij}$都是0,其中,i,j都是正整数。根据定理3.3,可以得到2048游戏的非终结状态之间的转移过程是非遍历的。 \ No newline at end of file diff --git a/main/background.tex b/main/background.tex index e5295c4..68dc1c5 100644 --- a/main/background.tex +++ b/main/background.tex @@ -31,11 +31,10 @@ That is $\forall s\in \mathcal{S}$, we have \sum_{s'\in \mathcal{S}}P_{\pi}(s',s)d_{\pi}(s')=d_{\pi}(s). \end{equation} -\begin{definition}[Ergodicity] Ergodicity assumption about the MDP assume that $d_{\pi}(s)$ exist for any policy $\pi$ and are independent of initial states \cite{Sutton2018book}. -\end{definition} + This mean all states are reachable under any policy from the current state after sufficiently many steps \cite{majeed2018q}. @@ -67,6 +66,7 @@ P_{\text{absorbing}}\dot{=}\begin{array}{c|ccccccc} \text{E} & \frac{1}{2} & 0 & 0 & 0 & \frac{1}{2} & 0 \end{array} \] +Note that absorbing states can be combined into one. According to (\ref{invariance}), the distribution $d_{\text{absorbing}}=\{1$, $0$, $0$, $0$, $0$, $0$\}. @@ -99,7 +99,7 @@ the distribution $d_{\text{restart}}=\{0.1$, Since the probability of T, A, B, C, D, E are non-zeros, random walk with restarts is ergodic. -\subsection{Ergodicity and Non-ergodicity between non-absorbing states} +\subsection{Ergodicity between non-absorbing states} For Markov chains with absorbing states, we usually decompose the transition matrix $P$ into the following form: \[ @@ -124,23 +124,18 @@ where $Q$ is the matrix of transition probabilities between N\dot{=} \sum_{i=0}^{\infty}Q^i=(I_{n-1}-Q)^{-1}, \end{equation} where $I_{n-1}$ is the $(n-1)\times(n-1)$ identity matrix. -Note that absorbing states can be combined into one. It is now easy to define whether the non-absorbing states are ergodic. \begin{definition}[Ergodicity between non-absorbing states] -Assume that $N$ exist for any policy $\pi$ - and are independent of initial states. +Assume that $N$ exists for any policy $\pi$ + and is independent of initial states. $\forall i,j \in S\setminus\{\text{T}\}$, $N_{ij}>0$, MDP is ergodic between non-absorbing states. + \label{definition2} \end{definition} -\begin{definition}[Non-ergodicity between non-absorbing states] -Assume that $N$ exist for any policy $\pi$ - and are independent of initial states. - $\exists i,j \in S\setminus\{\text{T}\}$, - $N_{ij}=0$, MDP is non-ergodic between non-absorbing states. -\end{definition} + For random walk with absorbing states, \[ @@ -161,26 +156,26 @@ Q_{\text{absorbing}}\dot{=}\begin{array}{c|ccccc} \text{E} & 0 & 0 & 0 & \frac{1}{2} & 0 \end{array} \] +%\[ +% R_{\text{absorbing}}\dot{=}\begin{array}{c|c} +% &\text{T} \\\hline +% \text{A} & \frac{1}{2} \\ +% \text{B} & 0 \\ +% \text{C} & 0 \\ +% \text{D} & 0 \\ +% \text{E} & \frac{1}{2} +% \end{array} +% \] +% \[ +% I_{\text{absorbing}}\dot{=}\begin{array}{c|c} +% &\text{T} \\\hline +% \text{T} & 1 +% \end{array} +% \] + +Then, \[ -R_{\text{absorbing}}\dot{=}\begin{array}{c|c} -&\text{T} \\\hline -\text{A} & \frac{1}{2} \\ -\text{B} & 0 \\ -\text{C} & 0 \\ -\text{D} & 0 \\ -\text{E} & \frac{1}{2} -\end{array} -\] -\[ -I_{\text{absorbing}}\dot{=}\begin{array}{c|c} -&\text{T} \\\hline -\text{T} & 1 -\end{array} -\] - -Then,{ -\[ -N_{\text{absorbing}}\dot{=}\begin{array}{c|ccccc} +N_{\text{absorbing}}=(I_5-Q_{\text{absorbing}})^{-1}=\begin{array}{c|ccccc} & \text{A} & \text{B} & \text{C} & \text{D} & \text{E} \\\hline \text{A} & \frac{5}{3} & \frac{4}{3} & 1 & \frac{2}{3} & \frac{1}{3} \\ \text{B} & \frac{4}{3} & \frac{8}{3} & 2 & \frac{4}{3} & \frac{2}{3} \\ @@ -188,15 +183,9 @@ N_{\text{absorbing}}\dot{=}\begin{array}{c|ccccc} \text{D} & \frac{2}{3} & \frac{4}{3} & 2 & \frac{8}{3} & \frac{4}{3} \\ \text{E} & \frac{1}{3} & \frac{2}{3} & 1 & \frac{4}{3} & \frac{5}{3} \\ \end{array} -\], -} -\highlight{昕闻帮我算这个矩阵} - -通过圣彼得堡例子说明,圣彼得堡不满足非吸收态之间的遍历性。 -给出定理,同样证明2048游戏不满足非吸收态之间的遍历性。 - - - - +\] +Bases on Definition \ref{definition2}, +random walk with absorbing states +is ergodic between non-absorbing states. diff --git a/main/introduction.tex b/main/introduction.tex index 3fc13e7..c7329a0 100644 --- a/main/introduction.tex +++ b/main/introduction.tex @@ -108,7 +108,7 @@ The comparison in this set of experiments indicates that while in the 2048 game, when the agent deviates from the optimal state, it may never have the chance to return to the previous state. - This relates to the game's property of traversability. + This relates to the game's property of ergodicity. In this paper, we proved that the game 2048 is non-ergodic. diff --git a/main/nonergodic.tex b/main/nonergodic.tex new file mode 100644 index 0000000..2a04d68 --- /dev/null +++ b/main/nonergodic.tex @@ -0,0 +1,96 @@ +\section{Non-ergodicity between non-absorbing states} +\begin{definition}[Non-ergodicity between non-absorbing states] +Assume that $N$ exists for any policy $\pi$ + and is independent of initial states. + $\exists i,j \in S\setminus\{\text{T}\}$, + $N_{ij}=0$, MDP is non-ergodic between non-absorbing states. + \label{definition3} +\end{definition} + + + + +\subsection{St. Petersburg paradox} + + + +The St. Petersburg paradox is a paradox associated +with gambling and decision theory. It is named after the city +of St. Petersburg in Russia and was initially introduced + by the mathematician Daniel Bernoulli in 1738. + +The paradox involves a gambling game with the following rules: +\begin{itemize} + \item Participants must pay a fixed entry fee to join the game. + \item The game continues until a coin lands heads up. +Each toss determines the prize, with the first heads + appearing on the $t$-th toss resulting in a prize of $2^t$. +\end{itemize} + + +%\input{pic/FigureParadox} + +The expected return of all possibilities is +\begin{equation} +\begin{split} +\mathbb{E}(x)&=\lim_{n\rightarrow \infty}\sum_{t=1}^n p(x)\times V(x)\\ +&=\lim_{n\rightarrow \infty}\sum_{t=1}^n\frac{1}{2^t} 2^t\\ +&=\infty +\end{split} +\end{equation} + + +Despite the potential for the prize to escalate +significantly, the expected value calculation +in probability theory reveals that the average +participant in this gambling game would end up paying + an infinite fee. This is due to the prize's expected + value being infinite. Even though the probability of + winning is small with each toss, when multiplied, + it leads to an infinitely increasing expected value. + +This paradox challenges individuals' intuitions and +decision-making regarding gambling. Despite the allure +of a potentially substantial prize, the actual expected + value of participating in this gambling game is infinite. + Consequently, in the long run, participants could face + an infinite monetary loss. + +\input{pic/paradox} + +Figure \ref{TruncatedPetersburg} is a truncated version +of the St. Petersburg paradox. The transition probabilities between +non-absorbing states are as follows: +\[ +Q_{\text{truncated}}\dot{=}\begin{array}{c|ccccc} + & \text{S}_1 & \text{S}_2 & \text{S}_3 & \text{S}_4 & \text{S}_5 \\\hline +\text{S}_1 & 0 & \frac{1}{2} & 0 & 0 & 0 \\ +\text{S}_2 & 0 & 0 & \frac{1}{2} & 0 & 0 \\ +\text{S}_3 & 0 & 0 & 0 & \frac{1}{2} & 0 \\ +\text{S}_4 & 0 & 0 & 0 & 0 & \frac{1}{2} \\ +\text{S}_5 & 0 & 0 & 0 & 0 & 0 +\end{array} +\] +Then, +\[ +N_{\text{truncated}}=(I_5-Q_{\text{truncated}})^{-1}=\begin{array}{c|ccccc} +& \text{S}_1 & \text{S}_2 & \text{S}_3 & \text{S}_4 & \text{S}_5 \\\hline +\text{S}_1 & 1 & \frac{1}{2} & \frac{1}{4} & \frac{1}{8} & \frac{1}{16} \\ +\text{S}_2 & 0 & 1 & \frac{1}{2} & \frac{1}{4} & \frac{1}{8} \\ +\text{S}_3 & 0 & 0 & 1 & \frac{1}{2} & \frac{1}{4} \\ +\text{S}_4 & 0 & 0 & 0 & 1 & \frac{1}{2} \\ +\text{S}_5 & 0 & 0 & 0 & 0 & 1 \\ +\end{array} +\] +Bases on Definition \ref{definition3}, +the truncated St. Petersburg paradox +is non-ergodic between non-absorbing states. + + + + + + + + + diff --git a/main/nonergodicity.tex b/main/nonergodicity.tex deleted file mode 100644 index f031bd1..0000000 --- a/main/nonergodicity.tex +++ /dev/null @@ -1,110 +0,0 @@ -\section{Non-ergodicity} - -\cite{kaplan1979sufficient} - - -We assume that the state-process is ergodic — i.e. all states -are reachable under any policy from the current state after -sufficiently many steps. \cite{majeed2018q} - -% ABCDE的随机游走的状态矩阵 -\[ -P = \begin{pmatrix} -1 & 0 & 0 & 0 & 0 & \\ -\frac{1}{2} & 0 & \frac{1}{2} & 0 & 0\\ -0 & \frac{1}{2} & 0 & \frac{1}{2} & 0\\ -0 & 0 & \frac{1}{2} & 0 & \frac{1}{2}\\ -0 & 0 & 0 & 0 & 1 -\end{pmatrix} -\] - -%可重启的随机游走矩阵 -\[ -P = \begin{pmatrix} -0 & 0 & 1 & 0 & 0 & \\ -\frac{1}{2} & 0 & \frac{1}{2} & 0 & 0\\ -0 & \frac{1}{2} & 0 & \frac{1}{2} & 0\\ -0 & 0 & \frac{1}{2} & 0 & \frac{1}{2}\\ -0 & 0 & 1 & 0 & 0 -\end{pmatrix} -\] - -% 计算平稳分布 -\[ -\begin{cases} -\pi_1 = \pi_3 \\ -\frac{1}{2}\pi_1 + \frac{1}{2}\pi_3 = \pi_2 \\ -\frac{1}{2}\pi_2 + \frac{1}{2}\pi_4 = \pi_3 \\ -\frac{1}{2}\pi_3 + \frac{1}{2}\pi_5 = \pi_4 \\ -\pi_3 = \pi_5 \\ -\pi_1 + \pi_2 + \pi_3 + \pi_4 + \pi_5 = 1 -\end{cases} -\] - -%随机游走pic -\input{pic/randomWalk} - -设两个上三角矩阵为( A ) 和 ( B ),它们的形式分别为: - -% 两个上三角矩阵乘积求和为上三角矩阵 -A = \begin{pmatrix} -a_{11} & a_{12} & \cdots & a_{1n} \\ -0 & a_{22} & \cdots & a_{2n} \\ -\vdots & \vdots & \ddots & \vdots \\ -0 & 0 & \cdots & a_{nn} -\end{pmatrix}, \quad - - -B = \begin{pmatrix} -b_{11} & b_{12} & \cdots & b_{1n} \\ -0 & b_{22} & \cdots & b_{2n} \\ -\vdots & \vdots & \ddots & \vdots \\ -0 & 0 & \cdots & b_{nn} -\end{pmatrix} - -[ -c_{ij}=\sum_{k=1}^{n} a_{ik}b_{kj} -] - -当 $i>j$ 时,有 $c_{ij}=0$,因为在此情况下,$a_{ik}=0$ 或 $b_{kj}=0$,乘积中至少有一项为 0。 -所以 $C$ 也是一个上三角矩阵。 -因此,证明了两个上三角矩阵的乘积还是一个上三角矩阵。 - -% N矩阵 -$N=1+Q^1+Q^2……$ - -% “重启”随机游走 pic -\input{pic/randomWalkRestart} - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/main/paradox.tex b/main/paradox.tex deleted file mode 100644 index 9056398..0000000 --- a/main/paradox.tex +++ /dev/null @@ -1,120 +0,0 @@ -\subsection{St. Petersburg paradox} -The St. Petersburg paradox is a paradox associated -with gambling and decision theory. It is named after the city -of St. Petersburg in Russia and was initially introduced - by the mathematician Daniel Bernoulli in 1738. - -The paradox involves a gambling game with the following rules: -\begin{itemize} - \item Participants must pay a fixed entry fee to join the game. - \item The game continues until a coin lands heads up. -Each toss determines the prize, with the first heads - appearing on the $t$-th toss resulting in a prize of $2^t$. -\end{itemize} - - -%\input{pic/FigureParadox} - -The expected return of all possibilities is -\begin{equation} -\begin{split} -\mathbb{E}(x)&=\lim_{n\rightarrow \infty}\sum_{t=1}^n p(x)\times V(x)\\ -&=\lim_{n\rightarrow \infty}\sum_{t=1}^n\frac{1}{2^t} 2^t\\ -&=\infty -\end{split} -\end{equation} - - -Despite the potential for the prize to escalate -significantly, the expected value calculation -in probability theory reveals that the average -participant in this gambling game would end up paying - an infinite fee. This is due to the prize's expected - value being infinite. Even though the probability of - winning is small with each toss, when multiplied, - it leads to an infinitely increasing expected value. - -This paradox challenges individuals' intuitions and -decision-making regarding gambling. Despite the allure -of a potentially substantial prize, the actual expected - value of participating in this gambling game is infinite. - Consequently, in the long run, participants could face - an infinite monetary loss. - -%圣彼得堡悖论期望 -[ -E(X)=\sum_{n}x(n)p(n) = \frac{1}{2}\times 2 + \frac{1}{4}\times 4 + \frac{1}{8}\times 8 + \cdots = \infty -] - -% 圣彼得堡悖论状态转移矩阵 - \[ - P = \begin{pmatrix} - 0 & \frac{1}{2} & 0 & 0 & 0 & ... & ... & \frac{1}{2} \\ - 0 & 0 & \frac{1}{2} & 0 & 0 & ... & ... & \frac{1}{2} \\ - \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \ddots & \vdots \\ - 1 & 0 & 0 & 0 & 0 & ... & ... & 0 - \end{pmatrix} - \] - -% 圣彼得堡悖论Q矩阵 - -\[ -Q = \begin{pmatrix} -0 & \frac{1}{2} & 0 & 0 & 0 & ... & ... \\ -0 & 0 & \frac{1}{2} & 0 & 0 & ... & ... \\ -\vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \ddots \\ -\end{pmatrix} -\] - -% N矩阵 -$N=1+Q^1+Q^2……$ - - -% 圣彼得堡悖论的N矩阵 -[ -N = \begin{pmatrix} -1 & \frac{1}{2} & \frac{1}{4} & \frac{1}{8} & \frac{1}{16} & \dots \ -0 & 1 & \frac{1}{2} & \frac{1}{4} & \frac{1}{8} & \dots \ -\vdots & \vdots & \vdots & \vdots & \vdots & \ddots -\end{pmatrix} -] - -% 带截断的圣彼得堡悖论 -\begin{table}[ht] - \centering - \begin{tabular}{|c|c|c|} - \hline - 截断长度(期望) & 100000次试验结果平均 & 偏差 \\ - \hline - 5 & 4.99 & -0.01 \\ - 10 & 9.89 & -0.11 \\ - 15 & 14.66 & -0.34 \\ - 20 & 16.83 & -3.17 \\ - 25 & 15.74 & -9.26 \\ - 30 & 186.15 & +156.15 \\ - \hline - \end{tabular} -\end{table} - - - - - - - - - - - - - - - - - - - - - - - diff --git a/main/theorem.tex b/main/theorem.tex deleted file mode 100644 index 8e49219..0000000 --- a/main/theorem.tex +++ /dev/null @@ -1,109 +0,0 @@ -\section{Ergodicity and nonergodicity of a Markov chain} - -\begin{assumption} -\label{assumption1} -In the sequel $\{X_n\}$ is a Markov chain with state space -$S=\{0,1,2,\ldots\}$, -$\{X_n\}$ -is aperiodic and irreducible, - and stationary transition probabilities -$\forall i,j\in S$, $P_{ij}\geq 0$. -\end{assumption} - - - - -\begin{theorem}(A sufficient condition for ergodicity \cite{pakes1969some,kaplan1979sufficient}) -Assume Assumption \ref{assumption1}, - and there exist constants -$N> 0$, $B> 0$, such that -\begin{equation} -\forall i\geq 0, \sum_{j\in S}(j-i)P_{ij}<\infty, -\end{equation} -\begin{equation} -\forall i\geq N, \sum_{j\in S}(j-i)P_{ij}<-B, -\end{equation} -$\{X_n\}$ is ergodic. -\end{theorem} - -请昕闻基于第一个定理完成 sutton 1998年书上 random walk 例子(书中图6.5)的遍历性证明。 - -\begin{theorem}(A sufficient condition for nonergodicity \cite{kaplan1979sufficient}) -Assume Assumption \ref{assumption1}, if for some integer $N\geq 0$ and constants $B\geq 0$, -$c\in[0,1]$ the following two conditions hold, then -$\{X_n\}$ is not ergodic: -\begin{equation} - \forall i\geq N, \sum_{j\in S} (j-i)P_{ij}>0, -\end{equation} -\begin{equation} - \forall i\geq N, \forall z\in[c,1], z^i-\sum_{j\in S}P_{ij}z^j\geq -B(1-z). - \end{equation} -\end{theorem} - -请昕闻基于第二个定理完成 sutton 1998年书上 cliff-walking task 例子(书中图6.13)的非遍历性证明。 -以及圣彼得堡悖论的非遍历性证明。 - - -\textcolor{red}{注意:证明过程应该是把Markov Chain写成N个状态(状态到底是第几个也需要明确定义),状态之间的转移概率是 -一个矩阵,需要把矩阵元素明确定义出来,然后基于两个定理,明确推导出两个公式是否满足} - - -\section{2024年5月4日晚10:49与李昕闻讨论} - -遍历性指的是任意状态两两之间都可以达,即两两之间的若干次转移概率大于0,并且具有稳定的分布。 - -具有吸收态的马尔科夫链是不满足遍历性的,因为它的稳定分布最终吸收态是1,非吸收态是0. - -我们的强化学习例子,包括迷宫、随机游走、2048等等,都包含吸收态,所以都不满足遍历性。 -但是并不影响强化学习,因为我们都有游戏结束后的restart设置。所以它从吸收态又重新开始了。 -因此,满足遍历性。 - -但是,从需求角度出发,我们真正想看到的是 除去吸收态,那些非吸收态相互之间是否能走通, -即去除吸收态,剩下的状态是否具有遍历性。因为,显然迷宫、随机游走是有遍历性的。 -圣彼得堡悖论、2048是没有遍历性的。 - - -根据遍历性定义, -P可以分解为Q R I 0,那么$N=(I-Q)^{-1}$,即描述了非吸收态之间的遍历关系, -但凡有一个是0,就说明这两两之间不可达。只要都大于0,就是可达的。 - -\textcolor{red}{如果这事可行,那么请李昕闻仔细对比概念,是否叫拟遍历性,还是有其它的概念? -一定要分得清!} - -这样的话,就可以计算随机游走、圣彼得堡悖论的N矩阵,看它们是否具有遍历性? -按照设想,随机游走应该每个值都大于0,而圣彼得堡悖论应该是上三角矩阵,甚至对角线都是0. - -基于这样的观察,2048,如何证明具有``非遍历性''? - -是否定义i,j,以及ij的转移概率即可?用构造性证明方法 -最终也是上三角,并且对角线为0? - -这样的话,相当于我们提出了一种满足非遍历性的充分条件吧? -似乎论文可以从这方面下手! - -% 2048游戏局面编码 -\begin{equation} -$p=2^{64} \cdot \sum_{m=0}^{15} I(B_m \neq 0) \cdot 2^{B_m} + \sum_{m=0}^{15} (1 \ll 4m) \cdot B_m$ -\end{equation} - -% 马尔可夫链标准形式 -\begin{equation} - P = \begin{bmatrix} - Q & R \\ - 0 & I -\end{bmatrix} -\end{equation} - - - -% 带策略的马尔可夫链标准形式 -\begin{equation} - P_\pi = \begin{pmatrix} Q_\pi & R_\pi \\ 0 & I \end{pmatrix} -\end{equation} - - - - - - - diff --git a/material/2048prove.tex b/material/2048prove.tex new file mode 100644 index 0000000..9039eb3 --- /dev/null +++ b/material/2048prove.tex @@ -0,0 +1,54 @@ +\section{2048游戏的非遍历性证明} +\subsection{2048游戏编码规则} +为了完成从游戏到马尔可夫决策过程的转化,首先需要对局面进行排序, +需要给局面一一对应一个可以进行比较的值,通过这个值对局面进行排序。 +需要保证的是,局面和大的排序靠后,如果局面和一样,则按照局面编码大小排序 +2048的游戏棋盘是$4×4$的,每一个格子上都可以是${空格,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192,16384,32768}$ +这些数字,为了便于计算机内的保存本文将其一一对应为{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}, +因为在游戏中不存在面值为2^0=1的方块,于是在这里使用0来特别地对应上原先格子中空格的情况。这个游戏的状态是有限的, +有不超过16^16=2^64个状态,对于每一个棋盘局面本文可以执行 “上”,“下”,“左”,“右”这四个动作。 +执行动作之后将会把方块往动作方向移动,如果有两个相同幂次的方块碰撞会合并成为一个幂次加一的方块, +并且在一个空格位置随机生成一个2或者4的方块。本文将这个棋盘用一个1×16的数组B进行表示, +其中B中存放的是方块的幂次,空格用0表示,m表示数组下标。 +$B_m∈{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15},0≤m≤15$。编码规则如下: + +\begin{equation} +% 2048游戏局面编码 +p=2^{64} \cdot \sum_{m=0}^{15} I(B_m \neq 0) \cdot 2^{B_m} + \sum_{m=0}^{15} (1 \ll 4m) \cdot B_m +\end{equation} + +其中$\mathbb{I}(B_m≠0)$是指示函数,当B_m的值不为0的时候这个函数返回1,也就是说不统计棋盘中的空格格子,这个编码的含义是将棋盘映射成一个长整型的变量, +本文将这个结果放在比64bit更高的位置上,也就是 64-84bit的位置。这个编码的主要含义是,将局面所有数字之和放在高bit位置上,排序时局面之和大的排在后面, +状态转移时就是从小的下标转移到大的下标上。另外后面64bit就是局面的编码,来保证这个值的唯一性,一个局面会对应一个唯一的值。 + +\input{pic/2048encode} + +上面的图中的这个局面的编码$p=(1≪64)∙30784+0x FEDC 5432 0000 0020$。 +本文是按照从下往上,从右往左的顺序给格子进行排列,右下角的格子是最低位,左上角的格子是最高位。 +由此本文获得了一个有关状态的大小关系,还可以了解,对于两个不同的局面,通过这个排序可以获取每一个状态的排列后相应的下标。 +特别的,本文将所有的死亡状态从序列中抽出放在状态转移的最大下标位置。 + +\subsection{2048游戏非终结状态的非遍历性证明} +首先能够很快得到2048游戏是不满足遍历性的这一结论,因为2048游戏本身具有众多的吸收态,因此根据定理3.2一定是不满足遍历性的。但是我们此处考虑非终结状态之间的转移关系的遍历性。 +推论: 2048游戏非终结状态转移矩阵是非遍历性的。 +证明:首先记2048游戏的马尔可夫决策过程在策略为π的情况下的状态转移矩阵为P_π,状态先后关系通过上面的编码方式确定,因为有吸收态存在于是可以将这个状态转移矩阵写成标准矩阵的形式: +\begin{equation} +% 带策略的马尔可夫链标准形式 +[ P_\pi = \begin{pmatrix} Q_\pi & R_\pi \ 0 & I \end{pmatrix} ] +\end{equation} + +根据游戏规则,两个相同幂次的方块碰撞会合并成为一个幂次加一的方块, +然后会在一个空格位置随机生成一个2或者4的方块,这一过程本文记为$S_i\to S_(i^')\to S_j$。 +\input{pic/2048example-p} + +如图3.5所示根据我们的规则可以保证,状态在后的排序也靠后。也就是说在$S_i\to S_j$的过程中,能够保证$p_ip_i ;p_j>p_{i^{,}}$。 +通过这种转移关系我们可以认定,不存在向前转移的情况,因此,与圣彼得堡悖论类似,2048游戏的n步转移的$Q_π^{n}$矩阵是一个上三角矩阵。 + +因此根据本文的编码2048游戏的状态转移过程一直是满足从小的下标转移到大的下标上这一情况。 +实际上任何不会向之前状态转移的过程都满足这个条件。在本文设计的状态转移下,状态对应下标只增不减,$q_{ij}>0$在$j-i>0$的条件下, +$j≤i$位置$q_{ij}$都是0,其中,i,j都是正整数。根据定理3.3,可以得到2048游戏的非终结状态之间的转移过程是非遍历的。 \ No newline at end of file diff --git a/material/nonergodicity.tex b/material/nonergodicity.tex new file mode 100644 index 0000000..f031bd1 --- /dev/null +++ b/material/nonergodicity.tex @@ -0,0 +1,110 @@ +\section{Non-ergodicity} + +\cite{kaplan1979sufficient} + + +We assume that the state-process is ergodic — i.e. all states +are reachable under any policy from the current state after +sufficiently many steps. \cite{majeed2018q} + +% ABCDE的随机游走的状态矩阵 +\[ +P = \begin{pmatrix} +1 & 0 & 0 & 0 & 0 & \\ +\frac{1}{2} & 0 & \frac{1}{2} & 0 & 0\\ +0 & \frac{1}{2} & 0 & \frac{1}{2} & 0\\ +0 & 0 & \frac{1}{2} & 0 & \frac{1}{2}\\ +0 & 0 & 0 & 0 & 1 +\end{pmatrix} +\] + +%可重启的随机游走矩阵 +\[ +P = \begin{pmatrix} +0 & 0 & 1 & 0 & 0 & \\ +\frac{1}{2} & 0 & \frac{1}{2} & 0 & 0\\ +0 & \frac{1}{2} & 0 & \frac{1}{2} & 0\\ +0 & 0 & \frac{1}{2} & 0 & \frac{1}{2}\\ +0 & 0 & 1 & 0 & 0 +\end{pmatrix} +\] + +% 计算平稳分布 +\[ +\begin{cases} +\pi_1 = \pi_3 \\ +\frac{1}{2}\pi_1 + \frac{1}{2}\pi_3 = \pi_2 \\ +\frac{1}{2}\pi_2 + \frac{1}{2}\pi_4 = \pi_3 \\ +\frac{1}{2}\pi_3 + \frac{1}{2}\pi_5 = \pi_4 \\ +\pi_3 = \pi_5 \\ +\pi_1 + \pi_2 + \pi_3 + \pi_4 + \pi_5 = 1 +\end{cases} +\] + +%随机游走pic +\input{pic/randomWalk} + +设两个上三角矩阵为( A ) 和 ( B ),它们的形式分别为: + +% 两个上三角矩阵乘积求和为上三角矩阵 +A = \begin{pmatrix} +a_{11} & a_{12} & \cdots & a_{1n} \\ +0 & a_{22} & \cdots & a_{2n} \\ +\vdots & \vdots & \ddots & \vdots \\ +0 & 0 & \cdots & a_{nn} +\end{pmatrix}, \quad + + +B = \begin{pmatrix} +b_{11} & b_{12} & \cdots & b_{1n} \\ +0 & b_{22} & \cdots & b_{2n} \\ +\vdots & \vdots & \ddots & \vdots \\ +0 & 0 & \cdots & b_{nn} +\end{pmatrix} + +[ +c_{ij}=\sum_{k=1}^{n} a_{ik}b_{kj} +] + +当 $i>j$ 时,有 $c_{ij}=0$,因为在此情况下,$a_{ik}=0$ 或 $b_{kj}=0$,乘积中至少有一项为 0。 +所以 $C$ 也是一个上三角矩阵。 +因此,证明了两个上三角矩阵的乘积还是一个上三角矩阵。 + +% N矩阵 +$N=1+Q^1+Q^2……$ + +% “重启”随机游走 pic +\input{pic/randomWalkRestart} + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/material/paradox.tex b/material/paradox.tex new file mode 100644 index 0000000..9056398 --- /dev/null +++ b/material/paradox.tex @@ -0,0 +1,120 @@ +\subsection{St. Petersburg paradox} +The St. Petersburg paradox is a paradox associated +with gambling and decision theory. It is named after the city +of St. Petersburg in Russia and was initially introduced + by the mathematician Daniel Bernoulli in 1738. + +The paradox involves a gambling game with the following rules: +\begin{itemize} + \item Participants must pay a fixed entry fee to join the game. + \item The game continues until a coin lands heads up. +Each toss determines the prize, with the first heads + appearing on the $t$-th toss resulting in a prize of $2^t$. +\end{itemize} + + +%\input{pic/FigureParadox} + +The expected return of all possibilities is +\begin{equation} +\begin{split} +\mathbb{E}(x)&=\lim_{n\rightarrow \infty}\sum_{t=1}^n p(x)\times V(x)\\ +&=\lim_{n\rightarrow \infty}\sum_{t=1}^n\frac{1}{2^t} 2^t\\ +&=\infty +\end{split} +\end{equation} + + +Despite the potential for the prize to escalate +significantly, the expected value calculation +in probability theory reveals that the average +participant in this gambling game would end up paying + an infinite fee. This is due to the prize's expected + value being infinite. Even though the probability of + winning is small with each toss, when multiplied, + it leads to an infinitely increasing expected value. + +This paradox challenges individuals' intuitions and +decision-making regarding gambling. Despite the allure +of a potentially substantial prize, the actual expected + value of participating in this gambling game is infinite. + Consequently, in the long run, participants could face + an infinite monetary loss. + +%圣彼得堡悖论期望 +[ +E(X)=\sum_{n}x(n)p(n) = \frac{1}{2}\times 2 + \frac{1}{4}\times 4 + \frac{1}{8}\times 8 + \cdots = \infty +] + +% 圣彼得堡悖论状态转移矩阵 + \[ + P = \begin{pmatrix} + 0 & \frac{1}{2} & 0 & 0 & 0 & ... & ... & \frac{1}{2} \\ + 0 & 0 & \frac{1}{2} & 0 & 0 & ... & ... & \frac{1}{2} \\ + \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \ddots & \vdots \\ + 1 & 0 & 0 & 0 & 0 & ... & ... & 0 + \end{pmatrix} + \] + +% 圣彼得堡悖论Q矩阵 + +\[ +Q = \begin{pmatrix} +0 & \frac{1}{2} & 0 & 0 & 0 & ... & ... \\ +0 & 0 & \frac{1}{2} & 0 & 0 & ... & ... \\ +\vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \ddots \\ +\end{pmatrix} +\] + +% N矩阵 +$N=1+Q^1+Q^2……$ + + +% 圣彼得堡悖论的N矩阵 +[ +N = \begin{pmatrix} +1 & \frac{1}{2} & \frac{1}{4} & \frac{1}{8} & \frac{1}{16} & \dots \ +0 & 1 & \frac{1}{2} & \frac{1}{4} & \frac{1}{8} & \dots \ +\vdots & \vdots & \vdots & \vdots & \vdots & \ddots +\end{pmatrix} +] + +% 带截断的圣彼得堡悖论 +\begin{table}[ht] + \centering + \begin{tabular}{|c|c|c|} + \hline + 截断长度(期望) & 100000次试验结果平均 & 偏差 \\ + \hline + 5 & 4.99 & -0.01 \\ + 10 & 9.89 & -0.11 \\ + 15 & 14.66 & -0.34 \\ + 20 & 16.83 & -3.17 \\ + 25 & 15.74 & -9.26 \\ + 30 & 186.15 & +156.15 \\ + \hline + \end{tabular} +\end{table} + + + + + + + + + + + + + + + + + + + + + + + diff --git a/material/theorem.tex b/material/theorem.tex new file mode 100644 index 0000000..8e49219 --- /dev/null +++ b/material/theorem.tex @@ -0,0 +1,109 @@ +\section{Ergodicity and nonergodicity of a Markov chain} + +\begin{assumption} +\label{assumption1} +In the sequel $\{X_n\}$ is a Markov chain with state space +$S=\{0,1,2,\ldots\}$, +$\{X_n\}$ +is aperiodic and irreducible, + and stationary transition probabilities +$\forall i,j\in S$, $P_{ij}\geq 0$. +\end{assumption} + + + + +\begin{theorem}(A sufficient condition for ergodicity \cite{pakes1969some,kaplan1979sufficient}) +Assume Assumption \ref{assumption1}, + and there exist constants +$N> 0$, $B> 0$, such that +\begin{equation} +\forall i\geq 0, \sum_{j\in S}(j-i)P_{ij}<\infty, +\end{equation} +\begin{equation} +\forall i\geq N, \sum_{j\in S}(j-i)P_{ij}<-B, +\end{equation} +$\{X_n\}$ is ergodic. +\end{theorem} + +请昕闻基于第一个定理完成 sutton 1998年书上 random walk 例子(书中图6.5)的遍历性证明。 + +\begin{theorem}(A sufficient condition for nonergodicity \cite{kaplan1979sufficient}) +Assume Assumption \ref{assumption1}, if for some integer $N\geq 0$ and constants $B\geq 0$, +$c\in[0,1]$ the following two conditions hold, then +$\{X_n\}$ is not ergodic: +\begin{equation} + \forall i\geq N, \sum_{j\in S} (j-i)P_{ij}>0, +\end{equation} +\begin{equation} + \forall i\geq N, \forall z\in[c,1], z^i-\sum_{j\in S}P_{ij}z^j\geq -B(1-z). + \end{equation} +\end{theorem} + +请昕闻基于第二个定理完成 sutton 1998年书上 cliff-walking task 例子(书中图6.13)的非遍历性证明。 +以及圣彼得堡悖论的非遍历性证明。 + + +\textcolor{red}{注意:证明过程应该是把Markov Chain写成N个状态(状态到底是第几个也需要明确定义),状态之间的转移概率是 +一个矩阵,需要把矩阵元素明确定义出来,然后基于两个定理,明确推导出两个公式是否满足} + + +\section{2024年5月4日晚10:49与李昕闻讨论} + +遍历性指的是任意状态两两之间都可以达,即两两之间的若干次转移概率大于0,并且具有稳定的分布。 + +具有吸收态的马尔科夫链是不满足遍历性的,因为它的稳定分布最终吸收态是1,非吸收态是0. + +我们的强化学习例子,包括迷宫、随机游走、2048等等,都包含吸收态,所以都不满足遍历性。 +但是并不影响强化学习,因为我们都有游戏结束后的restart设置。所以它从吸收态又重新开始了。 +因此,满足遍历性。 + +但是,从需求角度出发,我们真正想看到的是 除去吸收态,那些非吸收态相互之间是否能走通, +即去除吸收态,剩下的状态是否具有遍历性。因为,显然迷宫、随机游走是有遍历性的。 +圣彼得堡悖论、2048是没有遍历性的。 + + +根据遍历性定义, +P可以分解为Q R I 0,那么$N=(I-Q)^{-1}$,即描述了非吸收态之间的遍历关系, +但凡有一个是0,就说明这两两之间不可达。只要都大于0,就是可达的。 + +\textcolor{red}{如果这事可行,那么请李昕闻仔细对比概念,是否叫拟遍历性,还是有其它的概念? +一定要分得清!} + +这样的话,就可以计算随机游走、圣彼得堡悖论的N矩阵,看它们是否具有遍历性? +按照设想,随机游走应该每个值都大于0,而圣彼得堡悖论应该是上三角矩阵,甚至对角线都是0. + +基于这样的观察,2048,如何证明具有``非遍历性''? + +是否定义i,j,以及ij的转移概率即可?用构造性证明方法 +最终也是上三角,并且对角线为0? + +这样的话,相当于我们提出了一种满足非遍历性的充分条件吧? +似乎论文可以从这方面下手! + +% 2048游戏局面编码 +\begin{equation} +$p=2^{64} \cdot \sum_{m=0}^{15} I(B_m \neq 0) \cdot 2^{B_m} + \sum_{m=0}^{15} (1 \ll 4m) \cdot B_m$ +\end{equation} + +% 马尔可夫链标准形式 +\begin{equation} + P = \begin{bmatrix} + Q & R \\ + 0 & I +\end{bmatrix} +\end{equation} + + + +% 带策略的马尔可夫链标准形式 +\begin{equation} + P_\pi = \begin{pmatrix} Q_\pi & R_\pi \\ 0 & I \end{pmatrix} +\end{equation} + + + + + + + diff --git a/pic/paradox.tex b/pic/paradox.tex new file mode 100644 index 0000000..4afb944 --- /dev/null +++ b/pic/paradox.tex @@ -0,0 +1,31 @@ +\begin{figure}[!t] +\centering +\scalebox{0.9}{ +\begin{tikzpicture} + \node[draw, rectangle, fill=gray!50] (DEAD1) at (0,1.5) {T}; + \node[draw, rectangle, fill=gray!50] (DEAD2) at (1.5,1.5) {T}; + \node[draw, rectangle, fill=gray!50] (DEAD3) at (3,1.5) {T}; + \node[draw, rectangle, fill=gray!50] (DEAD4) at (4.5,1.5) {T}; + \node[draw, rectangle, fill=gray!50] (DEAD5) at (6,1.5) {T}; + \node[draw, circle] (A) at (0,0) {S$_1$}; + \node[draw, circle] (B) at (1.5,0) {S$_2$}; + \node[draw, circle] (C) at (3,0) {S$_3$}; + \node[draw, circle] (D) at (4.5,0) {S$_4$}; + \node[draw, circle] (E) at (6,0) {S$_5$}; + + \draw[->] (A) -- node {0.5} (DEAD1); + \draw[->] (A) -- node {0.5} (B); + \draw[->] (B) -- node {0.5} (DEAD2); + \draw[->] (B) -- node {0.5} (C); + \draw[->] (C) -- node {0.5} (DEAD3); + \draw[->] (C) -- node {0.5} (D); + \draw[->] (D) -- node {0.5} (DEAD4); + \draw[->] (D) -- node {0.5} (E); + \draw[->] (E) -- node {1.0} (DEAD5); + + \draw[->] ([xshift=-4ex]A.west) -- ([xshift=-5.2ex]A.east); +\end{tikzpicture} +} +\caption{Truncated St. Petersburg.} +\label{TruncatedPetersburg} +\end{figure} -- libgit2 0.26.0