Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
2
20240414IEEETG
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
XingguoChen
20240414IEEETG
Commits
59c0881b
Commit
59c0881b
authored
May 25, 2024
by
Lenovo
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
请昕闻帮忙算一下矩阵的逆
parent
38145d79
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
118 additions
and
32 deletions
+118
-32
main/background.tex
+114
-28
pic/randomWalk.tex
+2
-2
pic/randomWalkRestart.tex
+2
-2
No files found.
main/background.tex
View file @
59c0881b
...
...
@@ -58,54 +58,140 @@ of random walk with absorbing states
$
P
_{
\text
{
absorbing
}}$
is defined as follows:
\[
P
_{
\text
{
absorbing
}}
\dot
{
=
}
\begin
{
array
}{
c|ccccccc
}
&
\text
{
T
}_
1
&
\text
{
A
}
&
\text
{
B
}
&
\text
{
C
}
&
\text
{
D
}
&
\text
{
E
}
&
\text
{
T
}_
2
\\\hline
\text
{
T
}_
1
&
1
&
0
&
0
&
0
&
0
&
0
&
0
\\
\text
{
A
}
&
\frac
{
1
}{
2
}
&
0
&
\frac
{
1
}{
2
}
&
0
&
0
&
0
&
0
\\
\text
{
B
}
&
0
&
\frac
{
1
}{
2
}
&
0
&
\frac
{
1
}{
2
}
&
0
&
0
&
0
\\
\text
{
C
}
&
0
&
0
&
\frac
{
1
}{
2
}
&
0
&
\frac
{
1
}{
2
}
&
0
&
0
\\
\text
{
D
}
&
0
&
0
&
0
&
\frac
{
1
}{
2
}
&
0
&
\frac
{
1
}{
2
}
&
0
\\
\text
{
E
}
&
0
&
0
&
0
&
0
&
\frac
{
1
}{
2
}
&
0
&
\frac
{
1
}{
2
}
\\
\text
{
T
}_
2
&
0
&
0
&
0
&
0
&
0
&
0
&
1
&
\text
{
T
}
&
\text
{
A
}
&
\text
{
B
}
&
\text
{
C
}
&
\text
{
D
}
&
\text
{
E
}
\\\hline
\text
{
T
}
&
1
&
0
&
0
&
0
&
0
&
0
\\
\text
{
A
}
&
\frac
{
1
}{
2
}
&
0
&
\frac
{
1
}{
2
}
&
0
&
0
&
0
\\
\text
{
B
}
&
0
&
\frac
{
1
}{
2
}
&
0
&
\frac
{
1
}{
2
}
&
0
&
0
\\
\text
{
C
}
&
0
&
0
&
\frac
{
1
}{
2
}
&
0
&
\frac
{
1
}{
2
}
&
0
\\
\text
{
D
}
&
0
&
0
&
0
&
\frac
{
1
}{
2
}
&
0
&
\frac
{
1
}{
2
}
\\
\text
{
E
}
&
\frac
{
1
}{
2
}
&
0
&
0
&
0
&
\frac
{
1
}{
2
}
&
0
\end
{
array
}
\]
According to (
\ref
{
invariance
}
),
the distribution
$
d
_{
\text
{
absorbing
}}
=
\{
\frac
{
1
}{
2
}
$
,
$
0
$
,
$
0
$
,
$
0
$
,
$
0
$
,
$
0
$
,
$
\frac
{
1
}{
2
}
\}
$
.
Since the probabilit
y
of A, B, C, D, E are all zeros,
random walk with absorbing states
are
non-ergodic.
the distribution
$
d
_{
\text
{
absorbing
}}
=
\{
1
$
,
$
0
$
,
$
0
$
,
$
0
$
,
$
0
$
,
$
0
$
\}
.
Since the probabilit
ies
of A, B, C, D, E are all zeros,
random walk with absorbing states
is
non-ergodic.
\input
{
pic/randomWalkRestart
}
However, in reinforcement learning, we always assume the ergodicity assumption.
When encountering an absorbing state, we immediately reset and
transit to the initial states. Figure
\ref
{
randomwalkRestart
}
transit
ion
to the initial states. Figure
\ref
{
randomwalkRestart
}
is random walk with restarts.
The transition probobility matrix
of random walk with restarts
$
P
_{
\text
{
restart
}}$
is defined as follows:
\[
P
_{
\text
{
restart
}}
\dot
{
=
}
\begin
{
array
}{
c|ccccccc
}
&
\text
{
T
}_
1
&
\text
{
A
}
&
\text
{
B
}
&
\text
{
C
}
&
\text
{
D
}
&
\text
{
E
}
&
\text
{
T
}_
2
\\\hline
\text
{
T
}_
1
&
0
&
0
&
0
&
1
&
0
&
0
&
0
\\
\text
{
A
}
&
\frac
{
1
}{
2
}
&
0
&
\frac
{
1
}{
2
}
&
0
&
0
&
0
&
0
\\
\text
{
B
}
&
0
&
\frac
{
1
}{
2
}
&
0
&
\frac
{
1
}{
2
}
&
0
&
0
&
0
\\
\text
{
C
}
&
0
&
0
&
\frac
{
1
}{
2
}
&
0
&
\frac
{
1
}{
2
}
&
0
&
0
\\
\text
{
D
}
&
0
&
0
&
0
&
\frac
{
1
}{
2
}
&
0
&
\frac
{
1
}{
2
}
&
0
\\
\text
{
E
}
&
0
&
0
&
0
&
0
&
\frac
{
1
}{
2
}
&
0
&
\frac
{
1
}{
2
}
\\
\text
{
T
}_
2
&
0
&
0
&
0
&
1
&
0
&
0
&
0
&
\text
{
T
}
&
\text
{
A
}
&
\text
{
B
}
&
\text
{
C
}
&
\text
{
D
}
&
\text
{
E
}
\\\hline
\text
{
T
}
&
0
&
0
&
0
&
1
&
0
&
0
\\
\text
{
A
}
&
\frac
{
1
}{
2
}
&
0
&
\frac
{
1
}{
2
}
&
0
&
0
&
0
\\
\text
{
B
}
&
0
&
\frac
{
1
}{
2
}
&
0
&
\frac
{
1
}{
2
}
&
0
&
0
\\
\text
{
C
}
&
0
&
0
&
\frac
{
1
}{
2
}
&
0
&
\frac
{
1
}{
2
}
&
0
\\
\text
{
D
}
&
0
&
0
&
0
&
\frac
{
1
}{
2
}
&
0
&
\frac
{
1
}{
2
}
\\
\text
{
E
}
&
\frac
{
1
}{
2
}
&
0
&
0
&
0
&
\frac
{
1
}{
2
}
&
0
\\
\end
{
array
}
\]
According to (
\ref
{
invariance
}
),
the distribution
$
d
_{
\text
{
restart
}}
=
\{
0
.
05
$
,
$
0
.
1
$
,
$
0
.
2
$
,
$
0
.
3
$
,
$
0
.
2
$
,
$
0
.
1
$
,
$
0
.
05
\}
$
.
the distribution
$
d
_{
\text
{
restart
}}
=
\{
0
.
1
$
,
$
0
.
1
$
,
$
0
.
2
$
,
$
0
.
3
$
,
$
0
.
2
$
,
$
0
.
1
\}
$
.
Since the probability of T, A, B, C, D, E are non-zeros,
random walk with restarts
are
ergodic.
random walk with restarts
is
ergodic.
给出Markov Chain的遍历性定义,和充分条件。
根据随机游走例子说明 带有Absorbing state的是不满足遍历性的,
带有重启的强化学习训练设定是满足遍历性的。
\subsection
{
Ergodicity and Non-ergodicity between non-absorbing states
}
For Markov chains with absorbing states, we usually decompose
the transition matrix
$
P
$
into the following form:
\[
P
=
\begin
{
bmatrix
}
Q
&
R
\\
0
&
I
\end
{
bmatrix
}
,
\]
where
$
Q
$
is the matrix of transition probabilities between
non-absorbing states,
$
R
$
represents the transition probabilities
from non-absorbing states to absorbing states,
$
I
$
is the matrix of transition probabilities between absorbing states,
and
$
0
$
is a zero matrix.
Expected number of visits to non-absorbing states before being absorbed
is
\begin{equation}
N
\dot
{
=
}
\sum
_{
i=0
}^{
\infty
}
Q
^
i=(I
_{
n-1
}
-Q)
^{
-1
}
,
\end{equation}
where
$
I
_{
n
-
1
}$
is the
$
(
n
-
1
)
\times
(
n
-
1
)
$
identity matrix.
Note that absorbing states can be combined into one.
It is now easy to define whether the non-absorbing states
are ergodic.
\begin{definition}
[Ergodicity between non-absorbing states]
Assume that
$
N
$
exist for any policy
$
\pi
$
and are independent of initial states.
$
\forall
i,j
\in
S
\setminus\{\text
{
T
}
\}
$
,
$
N
_{
ij
}
>
0
$
, MDP is ergodic between non-absorbing states.
\end{definition}
\begin{definition}
[Non-ergodicity between non-absorbing states]
Assume that
$
N
$
exist for any policy
$
\pi
$
and are independent of initial states.
$
\exists
i,j
\in
S
\setminus\{\text
{
T
}
\}
$
,
$
N
_{
ij
}
=
0
$
, MDP is non-ergodic between non-absorbing states.
\end{definition}
For random walk with absorbing states,
\[
P
_{
\text
{
absorbing
}}
=
\begin
{
bmatrix
}
Q
_{
\text
{
absorbing
}}
&
R
_{
\text
{
absorbing
}}
\\
0
&
I
_{
\text
{
absorbing
}}
\end
{
bmatrix
}
,
\]
where
\[
Q
_{
\text
{
absorbing
}}
\dot
{
=
}
\begin
{
array
}{
c|ccccc
}
&
\text
{
A
}
&
\text
{
B
}
&
\text
{
C
}
&
\text
{
D
}
&
\text
{
E
}
\\\hline
\text
{
A
}
&
0
&
\frac
{
1
}{
2
}
&
0
&
0
&
0
\\
\text
{
B
}
&
\frac
{
1
}{
2
}
&
0
&
\frac
{
1
}{
2
}
&
0
&
0
\\
\text
{
C
}
&
0
&
\frac
{
1
}{
2
}
&
0
&
\frac
{
1
}{
2
}
&
0
\\
\text
{
D
}
&
0
&
0
&
\frac
{
1
}{
2
}
&
0
&
\frac
{
1
}{
2
}
\\
\text
{
E
}
&
0
&
0
&
0
&
\frac
{
1
}{
2
}
&
0
\end
{
array
}
\]
\[
R
_{
\text
{
absorbing
}}
\dot
{
=
}
\begin
{
array
}{
c|c
}
&
\text
{
T
}
\\\hline
\text
{
A
}
&
\frac
{
1
}{
2
}
\\
\text
{
B
}
&
0
\\
\text
{
C
}
&
0
\\
\text
{
D
}
&
0
\\
\text
{
E
}
&
\frac
{
1
}{
2
}
\end
{
array
}
\]
\[
I
_{
\text
{
absorbing
}}
\dot
{
=
}
\begin
{
array
}{
c|c
}
&
\text
{
T
}
\\\hline
\text
{
T
}
&
1
\end
{
array
}
\]
Then,
\highlight
{
\[
N
_{
\text
{
absorbing
}}
\dot
{
=
}
\begin
{
array
}{
c|ccccc
}
&
\text
{
A
}
&
\text
{
B
}
&
\text
{
C
}
&
\text
{
D
}
&
\text
{
E
}
\\\hline
\text
{
A
}
&
0
&
0
&
0
&
0
&
0
\\
\text
{
B
}
&
0
&
0
&
0
&
0
&
0
\\
\text
{
C
}
&
0
&
0
&
0
&
0
&
0
\\
\text
{
D
}
&
0
&
0
&
0
&
0
&
0
\\
\text
{
E
}
&
0
&
0
&
0
&
0
&
0
\end
{
array
}
\]
,
}
\highlight
{
昕闻帮我算这个矩阵
}
本文关注的是去除吸收态时,非吸收态之间的遍历性。
通过圣彼得堡例子说明,圣彼得堡不满足非吸收态之间的遍历性。
给出定理,同样证明2048游戏不满足非吸收态之间的遍历性。
...
...
pic/randomWalk.tex
View file @
59c0881b
...
...
@@ -2,8 +2,8 @@
\centering
\scalebox
{
0.9
}{
\begin{tikzpicture}
\node
[draw, rectangle, fill=gray!50]
(DEAD) at (0,0)
{
T
$
_
1
$}
;
\node
[draw, rectangle, fill=gray!50]
(DEAD2) at (9,0)
{
T
$
_
2
$}
;
\node
[draw, rectangle, fill=gray!50]
(DEAD) at (0,0)
{
T
$$}
;
\node
[
draw, rectangle, fill
=
gray
!
50
]
(
DEAD
2
)
at
(
9
,
0
)
{
T
$$}
;
\node
[draw, circle]
(A) at (1.5,0)
{
A
}
;
\node
[draw, circle]
(B) at (3,0)
{
B
}
;
\node
[draw, circle]
(C) at (4.5,0)
{
C
}
;
...
...
pic/randomWalkRestart.tex
View file @
59c0881b
...
...
@@ -2,8 +2,8 @@
\centering
\scalebox
{
0.9
}{
\begin{tikzpicture}
\node
[draw, rectangle, fill=gray!50]
(DEAD) at (0,0)
{
T
$
_
1
$}
;
\node
[draw, rectangle, fill=gray!50]
(DEAD2) at (9,0)
{
T
$
_
2
$}
;
\node
[draw, rectangle, fill=gray!50]
(DEAD) at (0,0)
{
T
$$}
;
\node
[
draw, rectangle, fill
=
gray
!
50
]
(
DEAD
2
)
at
(
9
,
0
)
{
T
$$}
;
\node
[draw, circle]
(A) at (1.5,0)
{
A
}
;
\node
[draw, circle]
(B) at (3,0)
{
B
}
;
\node
[draw, circle]
(C) at (4.5,0)
{
C
}
;
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment