diff --git a/.idea/.gitignore b/.idea/.gitignore new file mode 100644 index 0000000..e69de29 --- /dev/null +++ b/.idea/.gitignore diff --git a/.idea/20240414IEEETG.iml b/.idea/20240414IEEETG.iml new file mode 100644 index 0000000..d0876a7 --- /dev/null +++ b/.idea/20240414IEEETG.iml @@ -0,0 +1,8 @@ + + + + + + + + \ No newline at end of file diff --git a/.idea/inspectionProfiles/profiles_settings.xml b/.idea/inspectionProfiles/profiles_settings.xml new file mode 100644 index 0000000..105ce2d --- /dev/null +++ b/.idea/inspectionProfiles/profiles_settings.xml @@ -0,0 +1,6 @@ + + + + \ No newline at end of file diff --git a/.idea/misc.xml b/.idea/misc.xml new file mode 100644 index 0000000..812ab5a --- /dev/null +++ b/.idea/misc.xml @@ -0,0 +1,7 @@ + + + + + + \ No newline at end of file diff --git a/.idea/modules.xml b/.idea/modules.xml new file mode 100644 index 0000000..f3f649d --- /dev/null +++ b/.idea/modules.xml @@ -0,0 +1,8 @@ + + + + + + + + \ No newline at end of file diff --git a/.idea/vcs.xml b/.idea/vcs.xml new file mode 100644 index 0000000..35eb1dd --- /dev/null +++ b/.idea/vcs.xml @@ -0,0 +1,6 @@ + + + + + + \ No newline at end of file diff --git a/.idea/workspace.xml b/.idea/workspace.xml new file mode 100644 index 0000000..2d53a5f --- /dev/null +++ b/.idea/workspace.xml @@ -0,0 +1,67 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 1715927146068 + + + + + + \ No newline at end of file diff --git a/.settings/org.eclipse.core.resources.prefs b/.settings/org.eclipse.core.resources.prefs deleted file mode 100644 index 4824b80..0000000 --- a/.settings/org.eclipse.core.resources.prefs +++ /dev/null @@ -1,2 +0,0 @@ -eclipse.preferences.version=1 -encoding/=UTF-8 diff --git a/document.tex b/document.tex index f2f4237..d23d748 100644 --- a/document.tex +++ b/document.tex @@ -70,10 +70,10 @@ wangwenhao11@nudt.edu.cn). \end{IEEEkeywords} \input{main/background} -%\input{main/introduction} -%\input{main/nonergodicity} -%\input{main/paradox} -%\input{main/theorem} +\input{main/introduction} +\input{main/nonergodicity} +\input{main/paradox} +\input{main/theorem} diff --git a/main/2048pic/0.png b/main/2048pic/0.png new file mode 100644 index 0000000..f1e7f79 Binary files /dev/null and b/main/2048pic/0.png differ diff --git a/main/2048pic/2_0.png b/main/2048pic/2_0.png new file mode 100644 index 0000000..fa40579 Binary files /dev/null and b/main/2048pic/2_0.png differ diff --git a/main/2048pic/2_1.png b/main/2048pic/2_1.png new file mode 100644 index 0000000..cc9a6a5 Binary files /dev/null and b/main/2048pic/2_1.png differ diff --git a/main/2048pic/2_2.png b/main/2048pic/2_2.png new file mode 100644 index 0000000..5cbdb15 Binary files /dev/null and b/main/2048pic/2_2.png differ diff --git a/main/2048pic/3_0.png b/main/2048pic/3_0.png new file mode 100644 index 0000000..69bb6f0 Binary files /dev/null and b/main/2048pic/3_0.png differ diff --git a/main/2048pic/3_1.png b/main/2048pic/3_1.png new file mode 100644 index 0000000..8355006 Binary files /dev/null and b/main/2048pic/3_1.png differ diff --git a/main/2048pic/3_2.png b/main/2048pic/3_2.png new file mode 100644 index 0000000..b049b93 Binary files /dev/null and b/main/2048pic/3_2.png differ diff --git a/main/2048prove.tex b/main/2048prove.tex new file mode 100644 index 0000000..37a6038 --- /dev/null +++ b/main/2048prove.tex @@ -0,0 +1,54 @@ +\section{2048游戏的非遍历性证明} +\subsection{2048游戏编码规则} +为了完成从游戏到马尔可夫决策过程的转化,首先需要对局面进行排序, +需要给局面一一对应一个可以进行比较的值,通过这个值对局面进行排序。 +需要保证的是,局面和大的排序靠后,如果局面和一样,则按照局面编码大小排序 +2048的游戏棋盘是$4×4$的,每一个格子上都可以是${空格,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192,16384,32768}$ +这些数字,为了便于计算机内的保存本文将其一一对应为{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}, +因为在游戏中不存在面值为2^0=1的方块,于是在这里使用0来特别地对应上原先格子中空格的情况。这个游戏的状态是有限的, +有不超过16^16=2^64个状态,对于每一个棋盘局面本文可以执行 “上”,“下”,“左”,“右”这四个动作。 +执行动作之后将会把方块往动作方向移动,如果有两个相同幂次的方块碰撞会合并成为一个幂次加一的方块, +并且在一个空格位置随机生成一个2或者4的方块。本文将这个棋盘用一个1×16的数组B进行表示, +其中B中存放的是方块的幂次,空格用0表示,m表示数组下标。 +$B_m∈{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15},0≤m≤15$。编码规则如下: + +\begin{equation} +% 2048游戏局面编码 +p=2^{64} \cdot \sum_{m=0}^{15} I(B_m \neq 0) \cdot 2^{B_m} + \sum_{m=0}^{15} (1 \ll 4m) \cdot B_m +\end{equation} + +其中$\mathbb{I}(B_m≠0)$是指示函数,当B_m的值不为0的时候这个函数返回1,也就是说不统计棋盘中的空格格子,这个编码的含义是将棋盘映射成一个长整型的变量, +本文将这个结果放在比64bit更高的位置上,也就是 64-84bit的位置。这个编码的主要含义是,将局面所有数字之和放在高bit位置上,排序时局面之和大的排在后面, +状态转移时就是从小的下标转移到大的下标上。另外后面64bit就是局面的编码,来保证这个值的唯一性,一个局面会对应一个唯一的值。 + +\input{../pic/2048encode} + +上面的图中的这个局面的编码$p=(1≪64)∙30784+0x FEDC 5432 0000 0020$。 +本文是按照从下往上,从右往左的顺序给格子进行排列,右下角的格子是最低位,左上角的格子是最高位。 +由此本文获得了一个有关状态的大小关系,还可以了解,对于两个不同的局面,通过这个排序可以获取每一个状态的排列后相应的下标。 +特别的,本文将所有的死亡状态从序列中抽出放在状态转移的最大下标位置。 + +\subsection{2048游戏非终结状态的非遍历性证明} +首先能够很快得到2048游戏是不满足遍历性的这一结论,因为2048游戏本身具有众多的吸收态,因此根据定理3.2一定是不满足遍历性的。但是我们此处考虑非终结状态之间的转移关系的遍历性。 +推论: 2048游戏非终结状态转移矩阵是非遍历性的。 +证明:首先记2048游戏的马尔可夫决策过程在策略为π的情况下的状态转移矩阵为P_π,状态先后关系通过上面的编码方式确定,因为有吸收态存在于是可以将这个状态转移矩阵写成标准矩阵的形式: +\begin{equation} +% 带策略的马尔可夫链标准形式 +[ P_\pi = \begin{pmatrix} Q_\pi & R_\pi \ 0 & I \end{pmatrix} ] +\end{equation} + +根据游戏规则,两个相同幂次的方块碰撞会合并成为一个幂次加一的方块, +然后会在一个空格位置随机生成一个2或者4的方块,这一过程本文记为$S_i\to S_(i^')\to S_j$。 +\input{../pic/2048example-p} + +如图3.5所示根据我们的规则可以保证,状态在后的排序也靠后。也就是说在$S_i\to S_j$的过程中,能够保证$p_ip_i ;p_j>p_{i^{,}}$。 +通过这种转移关系我们可以认定,不存在向前转移的情况,因此,与圣彼得堡悖论类似,2048游戏的n步转移的$Q_π^{n}$矩阵是一个上三角矩阵。 + +因此根据本文的编码2048游戏的状态转移过程一直是满足从小的下标转移到大的下标上这一情况。 +实际上任何不会向之前状态转移的过程都满足这个条件。在本文设计的状态转移下,状态对应下标只增不减,$q_ij>0$在$j-i>0$的条件下, +$j≤i$位置$q_{ij}$都是0,其中,i,j都是正整数。根据定理3.3,可以得到2048游戏的非终结状态之间的转移过程是非遍历的。 \ No newline at end of file diff --git a/main/nonergodicity.tex b/main/nonergodicity.tex index a9abafd..892bb1b 100644 --- a/main/nonergodicity.tex +++ b/main/nonergodicity.tex @@ -7,6 +7,114 @@ We assume that the state-process is ergodic — i.e. all states are reachable under any policy from the current state after sufficiently many steps. \cite{majeed2018q} +% ABCDE的随机游走的状态矩阵 +\[ +P = \begin{pmatrix} +1 & 0 & 0 & 0 & 0 & \\ +\frac{1}{2} & 0 & \frac{1}{2} & 0 & 0\\ +0 & \frac{1}{2} & 0 & \frac{1}{2} & 0\\ +0 & 0 & \frac{1}{2} & 0 & \frac{1}{2}\\ +0 & 0 & 0 & 0 & 1 +\end{pmatrix} +\] + +%可重启的随机游走 +\[ +P = \begin{pmatrix} +0 & 0 & 1 & 0 & 0 & \\ +\frac{1}{2} & 0 & \frac{1}{2} & 0 & 0\\ +0 & \frac{1}{2} & 0 & \frac{1}{2} & 0\\ +0 & 0 & \frac{1}{2} & 0 & \frac{1}{2}\\ +0 & 0 & 1 & 0 & 0 +\end{pmatrix} +\] + +% 计算平稳分布 +\[ +\begin{cases} +\pi_1 = \pi_3 \\ +\frac{1}{2}\pi_1 + \frac{1}{2}\pi_3 = \pi_2 \\ +\frac{1}{2}\pi_2 + \frac{1}{2}\pi_4 = \pi_3 \\ +\frac{1}{2}\pi_3 + \frac{1}{2}\pi_5 = \pi_4 \\ +\pi_3 = \pi_5 \\ +\pi_1 + \pi_2 + \pi_3 + \pi_4 + \pi_5 = 1 +\end{cases} +\] + +%随机游走pic +\begin{tikzpicture} + \node[draw, rectangle, fill=gray!50] (DEAD) at (-2,0) ; + \node[draw, rectangle, fill=gray!50] (DEAD2) at (10,0) ; + \node[draw, circle] (A) at (0,0) {A}; + \node[draw, circle] (B) at (2,0) {B}; + \node[draw, circle] (C) at (4,0) {C}; + \node[draw, circle] (D) at (6,0) {D}; + \node[draw, circle] (E) at (8,0) {E}; + + \draw[->] (A) -- (DEAD); + \draw[->] (B) -- (A); + \draw[->] (B) to [bend left=30] (C); + \draw[->] (C) to [bend left=30] (B); + \draw[->] (C) to [bend left=30] (D); + \draw[->] (D) to [bend left=30] (C); + \draw[->] (D) -- (E); + \draw[->] (E) -- (DEAD2); + + \draw[->] ([yshift=4ex]C.north) -- ([yshift=4.5ex]C.south); + \end{tikzpicture} + +设两个上三角矩阵为( A ) 和 ( B ),它们的形式分别为: + +% 两个上三角矩阵乘积求和为上三角矩阵 +A = \begin{pmatrix} +a_{11} & a_{12} & \cdots & a_{1n} \\ +0 & a_{22} & \cdots & a_{2n} \\ +\vdots & \vdots & \ddots & \vdots \\ +0 & 0 & \cdots & a_{nn} +\end{pmatrix}, \quad + + +B = \begin{pmatrix} +b_{11} & b_{12} & \cdots & b_{1n} \\ +0 & b_{22} & \cdots & b_{2n} \\ +\vdots & \vdots & \ddots & \vdots \\ +0 & 0 & \cdots & b_{nn} +\end{pmatrix} + +[ +c_{ij}=\sum_{k=1}^{n} a_{ik}b_{kj} +] + +当 ( i>j ) 时,有 ( c_{ij}=0 ),因为在此情况下,( a_{ik}=0 ) 或 ( b_{kj}=0 ),乘积中至少有一项为 0。 +所以 ( C ) 也是一个上三角矩阵。 +因此,证明了两个上三角矩阵的乘积还是一个上三角矩阵。 + +% N矩阵 +$N=1+Q^1+Q^2……$ + +% “重启”随机游走 pic +\begin{tikzpicture} + + \node[draw, circle] (A) at (0,0) {A}; + \node[draw, circle] (B) at (2,0) {B}; + \node[draw, circle] (C) at (4,0) {C}; + \node[draw, circle] (D) at (6,0) {D}; + \node[draw, circle] (E) at (8,0) {E}; + + \draw[->] (A.north) to [bend left=30] (C.north) + \draw[->] (B) -- (A); + \draw[->] (B) to [bend left=30] (C); + \draw[->] (C) to [bend left=30] (B); + \draw[->] (C) to [bend left=30] (D); + \draw[->] (D) to [bend left=30] (C); + \draw[->] (D) -- (E); + \draw[->] (E.south) to [bend left=30] (C.south) + +\end{tikzpicture} + + + + diff --git a/main/paradox.tex b/main/paradox.tex index 05e507d..3339ab4 100644 --- a/main/paradox.tex +++ b/main/paradox.tex @@ -41,6 +41,61 @@ of a potentially substantial prize, the actual expected Consequently, in the long run, participants could face an infinite monetary loss. +%圣彼得堡悖论期望 +[ +E(X)=\sum_{n}x(n)p(n) = \frac{1}{2}\times 2 + \frac{1}{4}\times 4 + \frac{1}{8}\times 8 + \cdots = \infty +] + +% 圣彼得堡悖论状态转移矩阵 + \[ + P = \begin{pmatrix} + 0 & \frac{1}{2} & 0 & 0 & 0 & ... & ... & \frac{1}{2} \\ + 0 & 0 & \frac{1}{2} & 0 & 0 & ... & ... & \frac{1}{2} \\ + \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \ddots & \vdots \\ + 1 & 0 & 0 & 0 & 0 & ... & ... & 0 + \end{pmatrix} + \] + +% 圣彼得堡悖论Q矩阵 + +\[ +Q = \begin{pmatrix} +0 & \frac{1}{2} & 0 & 0 & 0 & ... & ... \\ +0 & 0 & \frac{1}{2} & 0 & 0 & ... & ... \\ +\vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \ddots \\ +\end{pmatrix} +\] + +% N矩阵 +$N=1+Q^1+Q^2……$ + + +% 圣彼得堡悖论的N矩阵 +[ +N = \begin{pmatrix} +1 & \frac{1}{2} & \frac{1}{4} & \frac{1}{8} & \frac{1}{16} & \dots \ +0 & 1 & \frac{1}{2} & \frac{1}{4} & \frac{1}{8} & \dots \ +\vdots & \vdots & \vdots & \vdots & \vdots & \ddots +\end{pmatrix} +] + +% 带截断的圣彼得堡悖论 +\begin{table}[ht] + \centering + \begin{tabular}{|c|c|c|} + \hline + 截断长度(期望) & 100000次试验结果平均 & 偏差 \\ + \hline + 5 & 4.99 & -0.01 \\ + 10 & 9.89 & -0.11 \\ + 15 & 14.66 & -0.34 \\ + 20 & 16.83 & -3.17 \\ + 25 & 15.74 & -9.26 \\ + 30 & 186.15 & +156.15 \\ + \hline + \end{tabular} + \end{table} + diff --git a/main/theorem.tex b/main/theorem.tex index c0ead7d..57822ed 100644 --- a/main/theorem.tex +++ b/main/theorem.tex @@ -81,6 +81,20 @@ P可以分解为Q R I 0,那么$N=(I-Q)^{-1}$,即描述了非吸收态之间 这样的话,相当于我们提出了一种满足非遍历性的充分条件吧? 似乎论文可以从这方面下手! +% 2048游戏局面编码 +$p=2^{64} \cdot \sum_{m=0}^{15} I(B_m \neq 0) \cdot 2^{B_m} + \sum_{m=0}^{15} (1 \ll 4m) \cdot B_m$ + +% 马尔可夫链标准形式 +[ +P = \begin{bmatrix} +Q & R \ +0 & I +\end{bmatrix} +] + +% 带策略的马尔可夫链标准形式 +[ P_\pi = \begin{pmatrix} Q_\pi & R_\pi \ 0 & I \end{pmatrix} ] + diff --git a/pic/2048encode.tex b/pic/2048encode.tex new file mode 100644 index 0000000..b542bf1 --- /dev/null +++ b/pic/2048encode.tex @@ -0,0 +1,16 @@ +\begin{tikzpicture} + \draw (0, 0) grid (4, 4); + \node at (0.5, 3.5) {16384}; + \node at (1.5, 3.5) {8192}; + \node at (2.5, 3.5) {4096}; + \node at (3.5, 3.5) {2048}; + \node at (0.5, 2.5) {32}; + \node at (1.5, 2.5) {16}; + \node at (2.5, 2.5) {8}; + \node at (3.5, 2.5) {4}; + \node at (2.5, 0.5) {4}; + + \draw[<-] (0, -0.5) -- (3.8, -0.5); + + \draw[<-] (4.5, 4) -- (4.5, 0.5); +\end{tikzpicture} \ No newline at end of file diff --git a/pic/2048epsilon-greedy.pdf b/pic/2048epsilon-greedy.pdf new file mode 100644 index 0000000..2e2f49c Binary files /dev/null and b/pic/2048epsilon-greedy.pdf differ diff --git a/pic/2048example-p.tex b/pic/2048example-p.tex new file mode 100644 index 0000000..2bfab32 --- /dev/null +++ b/pic/2048example-p.tex @@ -0,0 +1,58 @@ +\begin{figure*} + \centering + \tikzstyle{place}=[circle,draw=blue!50,fill=blue!20,thick,minimum size=10mm] + \tikzstyle{transition}=[rectangle,draw=black!50,fill=black!20,thick,minimum size=10mm] + \begin{tikzpicture}[scale=0.6] + \node[place] (st) at ( 0,29) {$S_i$}; + \node[transition] (as2) at ( 4,25) {$S_{i^'}$}; + \node[transition] (as1) at ( -4,25) {$S_{i^'}$}; + \foreach \y in {1,2} + \foreach \x in {1,2} + \node[place] (s\y\x) at (8*\y+4*\x-18,21) {$S_i$}; + + \node[circle,draw=black!50,fill=red!60,thick,minimum size=10mm] (sd) at (-6,21) {$S_j$}; + + \foreach \y in {1,2} + \draw [->] (st.south) -- (as\y.north); + + \foreach \y in {1,2} + \foreach \x in {1,2} + \draw [->] (as\y.south) -- (s\y\x.north); + + + + + \node (p1) at(-10,16) {\includegraphics[height=3.0cm,width=3cm]{main/2048pic/2_1.png}}; + + + \node (p2) at(-3,16) {\includegraphics[height=3.0cm,width=3cm]{main/2048pic/2_2.png}}; + + \node (p3) at(3,16) {\includegraphics[height=3.0cm,width=3cm]{main/2048pic/3_1.png}}; + + \node (p4) at(10,16) {\includegraphics[height=3.0cm,width=3cm]{main/2048pic/3_2.png}}; + + \draw [-] (p1.north) -- (s11.south); + \draw [-] (p2.north) -- (s12.south); + \draw [-] (p3.north) -- (s21.south); + \draw [-] (p4.north) -- (s22.south); + + + \node (p5) at(-7,30) {\includegraphics[height=3.0cm,width=3cm]{main/2048pic/0.png}}; + + \node (p6) at(10,25) {\includegraphics[height=3.0cm,width=3cm]{main/2048pic/2_0.png}}; + + \node (p7) at(-10,25) {\includegraphics[height=3.0cm,width=3cm]{main/2048pic/2_0.png}}; + + \draw [-] (st.west) -- (p5.east); + \draw [-] (as1.west) -- (p7.east); + \draw [-] (as2.east) -- (p6.west); + + \node at (2, 30) {$p_i=30490·2^{64}+0x134E346D257C119A$}; + \node at (-9, 22) {$p_{i^{,}}=30490·2^{64}+0x134E346D257C119A$}; + \node at (-8.5, 13) {$p_j=30492·2^{64}+0x134E346D257C119A$}; + + + \end{tikzpicture} + \caption{ 2048的一个例子-P} + \label{sttree} + \end{figure*} \ No newline at end of file diff --git a/pic/2048example.tex b/pic/2048example.tex new file mode 100644 index 0000000..fd2a7ab --- /dev/null +++ b/pic/2048example.tex @@ -0,0 +1,53 @@ +\begin{figure*} + \centering + \tikzstyle{place}=[circle,draw=blue!50,fill=blue!20,thick,minimum size=10mm] + \tikzstyle{transition}=[rectangle,draw=black!50,fill=black!20,thick,minimum size=10mm] + \begin{tikzpicture}[scale=0.6] + \node[place] (st) at ( 0,29) {$S_i$}; + \node[transition] (as2) at ( 4,25) {$S_{i^'}$}; + \node[transition] (as1) at ( -4,25) {$S_{i^'}$}; + \foreach \y in {1,2} + \foreach \x in {1,2} + \node[place] (s\y\x) at (8*\y+4*\x-18,21) {$S_i$}; + + \node[circle,draw=black!50,fill=red!60,thick,minimum size=10mm] (sd) at (-6,21) {$S_j$}; + + \foreach \y in {1,2} + \draw [->] (st.south) -- (as\y.north); + + \foreach \y in {1,2} + \foreach \x in {1,2} + \draw [->] (as\y.south) -- (s\y\x.north); + + + + \node (p1) at(-10,16) {\includegraphics[height=3.0cm,width=3cm]{main/2048pic/2_1.png}}; + + + \node (p2) at(-3,16) {\includegraphics[height=3.0cm,width=3cm]{main/2048pic/2_2.png}}; + + \node (p3) at(3,16) {\includegraphics[height=3.0cm,width=3cm]{main/2048pic/3_1.png}}; + + \node (p4) at(10,16) {\includegraphics[height=3.0cm,width=3cm]{main/2048pic/3_2.png}}; + + \draw [-] (p1.north) -- (s11.south); + \draw [-] (p2.north) -- (s12.south); + \draw [-] (p3.north) -- (s21.south); + \draw [-] (p4.north) -- (s22.south); + + + \node (p5) at(-7,30) {\includegraphics[height=3.0cm,width=3cm]{main/2048pic/0.png}}; + + \node (p6) at(10,25) {\includegraphics[height=3.0cm,width=3cm]{main/2048pic/2_0.png}}; + + \node (p7) at(-10,25) {\includegraphics[height=3.0cm,width=3cm]{main/2048pic/2_0.png}}; + + \draw [-] (st.west) -- (p5.east); + \draw [-] (as1.west) -- (p7.east); + \draw [-] (as2.east) -- (p6.west); + + + \end{tikzpicture} + \caption{ 2048的一个例子} + \label{sttree} + \end{figure*} \ No newline at end of file diff --git a/pic/migongeps-greedy.pdf b/pic/migongeps-greedy.pdf new file mode 100644 index 0000000..7535168 Binary files /dev/null and b/pic/migongeps-greedy.pdf differ