Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
2
20240414IEEETG
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
XingguoChen
20240414IEEETG
Commits
ff9efea9
Commit
ff9efea9
authored
May 28, 2024
by
Lenovo
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
大致算是证完了,后面就是非遍历性的讨论,以及是否需要加对期望最大搜索算法的改进了
parent
6111d3dd
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
95 additions
and
21 deletions
+95
-21
main/2048isNonergodic.tex
+65
-1
main/background.tex
+29
-20
main/nonergodic.tex
+1
-0
No files found.
main/2048isNonergodic.tex
View file @
ff9efea9
\section
{
Non-ergodicity of 2048
}
\section
{
Non-ergodicity of 2048
}
The purpose of this section is to prove the non-ergodicity of the 2048 game.
\begin{theorem}
\begin{theorem}
2048 game is non-ergodic between non-absorbing states.
2048 game is non-ergodic between non-absorbing states.
\end{theorem}
\end{theorem}
\begin{IEEEproof}
\begin{IEEEproof}
To apply Theorem
\ref
{
judgmentTheorem
}
, what we need
to do is to assign a countable value to the 2048 game board
and demonstrate the properties of the
state transition probabilities in the 2048 game.
In the 2048 game, each tile has 16 potential values,
including empty and
$
2
^
k
$
,
$
k
\in\{
1
,
2
,
3
,
\ldots
,
15
\}
$
.
Using 4 bits to represent a tile, the game board is a 4
$
\times
$
4 matrix
$
B
$
. The corresponding tile is then computed as follows:
\begin{equation}
1
\leq
m
\text
{
,
}
n
\leq
4
\text
{
,
}
tile
_{
m,n
}
=
\begin{cases}
0,
&
\text
{
if
}
B
_{
mn
}
=0;
\\
2
^{
B
_{
mn
}}
,
&
\text
{
otherwise.
}
\end{cases}
\label
{
equationTile
}
\end{equation}
The sum of all tiles in the game board is
\begin{equation}
sum(B) =
\sum
_{
m=1
}^
4
\sum
_{
n=1
}^
4 tile
_{
mn
}
.
\end{equation}
A 64-bit long integer can uniquely represent any game board state.
\begin{equation}
long(B)=
\sum
_{
m=1
}^
4
\sum
_{
n=1
}^
416
^{
(m-1)*4+(n-1)
}
\cdot
B
_{
mn
}
.
\end{equation}
We have
\begin{equation}
long(B)<2
^{
64
}
.
\label
{
size
}
\end{equation}
The size of the board space
$
\mathcal
{
B
}$
is
$
|
\mathcal
{
B
}
|
=
2
^{
64
}$
.
Define a utility function on board,
\begin{equation}
u(B) = 2
^{
64
}
\cdot
sum(B)+long(B).
\label
{
utility
}
\end{equation}
It is easy to verify that
$
\forall
B
_
1
, B
_
2
\in
\mathcal
{
B
}$
,
if
$
B
_
1
\neq
B
_
2
$
, then
$
u
(
B
_
1
)
\neq
u
(
B
_
2
)
$
.
For all possible board,
$
\forall
B
\in
\mathcal
{
B
}$
, calculate the utility value
$
u
(
B
)
$
, and sort
$
B
$
by
$
u
(
B
)
$
in ascending order.
Let
$
I
(
B
)
$
be the index of the board
$
B
$
after sorting,
we have
\begin{equation}
\forall
B
_
1, B
_
2
\in
\mathcal
{
B
}
, u(B
_
1)<u(B
_
2)
\iff
I(B
_
1)<I(B
_
2).
\label
{
basis
}
\end{equation}
For any transition
$
\langle
B
_
1
, a, B
_
1
', B
_
2
\rangle
$
in the 2048 game,
we have
$
sum
(
B
_
1
)=
sum
(
B
_
1
'
)
$
regardless of whether at least two tiles merge.
Due to a new generated 2-tile or 4-tile in board
$
B
_
2
$
,
$
sum
(
B
_
2
)
>sum
(
B
_
1
'
)
$
, that is
$
sum
(
B
_
2
)
>sum
(
B
_
1
)
$
.
Based on (
\ref
{
size
}
) and (
\ref
{
utility
}
),
we have
$
u
(
B
_
2
)
>u
(
B
_
1
)
$
.
That means
$
I
(
B
_
2
)
>I
(
B
_
1
)
$
.
The transition probability between non-absorbing state satisifies (
\ref
{
condition
}
),
the claim follows by applying Theorem
\ref
{
judgmentTheorem
}
.
\end{IEEEproof}
\end{IEEEproof}
%\input{material/2048prove}
%\input{material/2048prove}
...
...
main/background.tex
View file @
ff9efea9
\section
{
Background
}
\section
{
Background
}
\subsection
{
2048 game rules
}
\subsection
{
MDP and 2048 game
}
Consider Markov decision process (MDP)
$
\langle
\mathcal
{
S
}$
,
$
\mathcal
{
A
}$
,
$
\mathcal
{
R
}$
,
$
\mathcal
{
T
}$$
\rangle
$
, where
$
\mathcal
{
S
}
=
\{
1
,
2
,
3
,
\ldots\}
$
is a finite state space,
$
|
\mathcal
{
S
}
|
=
n
$
,
$
\mathcal
{
A
}$
is an action space,
$
\mathcal
{
T
}
:
\mathcal
{
S
}
\times
\mathcal
{
A
}
\times
\mathcal
{
S
}
\rightarrow
[
0
,
1
]
$
is a transition function,
$
\mathcal
{
R
}
:
\mathcal
{
S
}
\times
\mathcal
{
A
}
\times
\mathcal
{
S
}
\rightarrow
\mathbb
{
R
}$
is a reward function.
Policy
$
\pi
:S
\times
A
\rightarrow
[
0
,
1
]
$
selects an action
$
a
$
in state
$
s
$
with probability
$
\pi
(
a|s
)
$
.
State value function under policy
$
\pi
$
, denoted
$
V
^{
\pi
}
:S
\rightarrow
\mathbb
{
R
}$
, represents the expected sum of rewards in
the MDP under policy
$
\pi
$
:
$
V
^{
\pi
}
(
s
)=
\mathbb
{
E
}_{
\pi
}
\left
[
\sum
_{
t
=
0
}^{
\infty
}
r
_
t|s
_
0
=
s
\right
]
$
.
The 2048 game consists of a 4
$
\times
$
4 grid board, totaling 16 squares.
The 2048 game consists of a 4
$
\times
$
4 grid board, totaling 16 squares.
At the beginning of the game, two squares are randomly filled
At the beginning of the game, two squares are randomly filled
with tiles of either 2 or 4.
with tiles of either 2 or 4.
...
@@ -14,21 +30,9 @@ The 2048 game consists of a 4$\times$4 grid board, totaling 16 squares.
...
@@ -14,21 +30,9 @@ The 2048 game consists of a 4$\times$4 grid board, totaling 16 squares.
Each tile can only participate in one merge operation per move.
Each tile can only participate in one merge operation per move.
After each move, a new tile appears on a random empty square.
After each move, a new tile appears on a random empty square.
The new tile is 2 with probability 0.1, and 4 with probability 0.9.
The new tile is 2 with probability 0.1, and 4 with probability 0.9.
The game ends when all squares are filled, and no valid merge operations can be made.
The game ends when all squares are filled, and no valid merge operations can be made.
\subsection
{
MDP
}
Consider Markov decision process (MDP)
\subsection
{
Ergodicity and Non-ergodicity of Markov Chains
}
$
\langle
\mathcal
{
S
}$
,
$
\mathcal
{
A
}$
,
$
\mathcal
{
R
}$
,
$
\mathcal
{
T
}$$
\rangle
$
, where
$
\mathcal
{
S
}
=
\{
1
,
2
,
3
,
\ldots\}
$
is a finite state space,
$
|
\mathcal
{
S
}
|
=
n
$
,
$
\mathcal
{
A
}$
is an action space,
$
\mathcal
{
T
}
:
\mathcal
{
S
}
\times
\mathcal
{
A
}
\times
\mathcal
{
S
}
\rightarrow
[
0
,
1
]
$
is a transition function,
$
\mathcal
{
R
}
:
\mathcal
{
S
}
\times
\mathcal
{
A
}
\times
\mathcal
{
S
}
\rightarrow
\mathbb
{
R
}$
is a reward function.
Policy
$
\pi
:S
\times
A
\rightarrow
[
0
,
1
]
$
selects an action
$
a
$
in state
$
s
$
with probability
$
\pi
(
a|s
)
$
.
State value function under policy
$
\pi
$
, denoted
$
V
^{
\pi
}
:S
\rightarrow
\mathbb
{
R
}$
, represents the expected sum of rewards in
the MDP under policy
$
\pi
$
:
$
V
^{
\pi
}
(
s
)=
\mathbb
{
E
}_{
\pi
}
\left
[
\sum
_{
t
=
0
}^{
\infty
}
r
_
t|s
_
0
=
s
\right
]
$
.
Given a steady policy
$
\pi
$
, MDP becomes a Markov chain on state space
Given a steady policy
$
\pi
$
, MDP becomes a Markov chain on state space
$
\mathcal
{
S
}$
with a matrix
$
\mathcal
{
S
}$
with a matrix
...
@@ -47,9 +51,14 @@ That is $\forall s\in \mathcal{S}$, we have
...
@@ -47,9 +51,14 @@ That is $\forall s\in \mathcal{S}$, we have
\sum
_{
s'
\in
\mathcal
{
S
}}
P
_{
\pi
}
(s',s)d
_{
\pi
}
(s')=d
_{
\pi
}
(s).
\sum
_{
s'
\in
\mathcal
{
S
}}
P
_{
\pi
}
(s',s)d
_{
\pi
}
(s')=d
_{
\pi
}
(s).
\end{equation}
\end{equation}
Ergodicity assumption about the MDP assume that
\begin{definition}
[Ergodicity]
$
d
_{
\pi
}
(
s
)
$
exist for any policy
$
\pi
$
and are independent of
Assume
$
d
_{
\pi
}
(
s
)
$
exist for any policy
$
\pi
$
and
initial states
\cite
{
Sutton2018book
}
.
are independent of initial states,
MDP is ergodic if
$
\forall
s
\in
\mathcal
{
S
}$
,
$
d
_{
\pi
}
(
s
)
>
0
$
.
\end{definition}
This mean all states are reachable under any policy from the
This mean all states are reachable under any policy from the
...
@@ -59,7 +68,7 @@ A sufficient condition for this assumption is that
...
@@ -59,7 +68,7 @@ A sufficient condition for this assumption is that
all other eigenvalues of
$
P
_{
\pi
}$
are of modulus <1.
all other eigenvalues of
$
P
_{
\pi
}$
are of modulus <1.
\subsection
{
Ergodicity and Non-ergodicity of Markov Chains
}
\input
{
pic/randomWalk
}
\input
{
pic/randomWalk
}
...
...
main/nonergodic.tex
View file @
ff9efea9
...
@@ -90,6 +90,7 @@ is non-ergodic between non-absorbing states.
...
@@ -90,6 +90,7 @@ is non-ergodic between non-absorbing states.
By observing the truncated St. Petersburg paradox,
By observing the truncated St. Petersburg paradox,
it is easy to provide a sufficient condition for non-ergodicity between non-absorbing states.
it is easy to provide a sufficient condition for non-ergodicity between non-absorbing states.
\begin{theorem}
[A sufficient condition for non-ergodicity between non-absorbing states]
\begin{theorem}
[A sufficient condition for non-ergodicity between non-absorbing states]
\label
{
judgmentTheorem
}
Given a Markov chain with absorbing states,
Given a Markov chain with absorbing states,
suppose the size of the non-absorbing states
$
|S
\setminus\{\text
{
T
}
\}
|
\geq
2
$
.
suppose the size of the non-absorbing states
$
|S
\setminus\{\text
{
T
}
\}
|
\geq
2
$
.
If the transition matrix
$
Q
$
between non-absorbing states satifies,
If the transition matrix
$
Q
$
between non-absorbing states satifies,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment