Commit e0dead58 by Lenovo

昕闻更新了图,以及图里面的数据

parent 59c0881b
...@@ -178,15 +178,15 @@ I_{\text{absorbing}}\dot{=}\begin{array}{c|c} ...@@ -178,15 +178,15 @@ I_{\text{absorbing}}\dot{=}\begin{array}{c|c}
\end{array} \end{array}
\] \]
Then,\highlight{ Then,{
\[ \[
N_{\text{absorbing}}\dot{=}\begin{array}{c|ccccc} N_{\text{absorbing}}\dot{=}\begin{array}{c|ccccc}
& \text{A} & \text{B} & \text{C} & \text{D} & \text{E} \\\hline & \text{A} & \text{B} & \text{C} & \text{D} & \text{E} \\\hline
\text{A} & 0 & 0 & 0 & 0 & 0 \\ \text{A} & \frac{5}{3} & \frac{4}{3} & 1 & \frac{2}{3} & \frac{1}{3} \\
\text{B} & 0 & 0 & 0 & 0 & 0 \\ \text{B} & \frac{4}{3} & \frac{8}{3} & 2 & \frac{4}{3} & \frac{2}{3} \\
\text{C} & 0 & 0 & 0 & 0 & 0 \\ \text{C} & 1 & 2 & 3 & 2 & 1 \\
\text{D} & 0 & 0 & 0 & 0 & 0 \\ \text{D} & \frac{2}{3} & \frac{4}{3} & 2 & \frac{8}{3} & \frac{4}{3} \\
\text{E} & 0 & 0 & 0 & 0 & 0 \text{E} & \frac{1}{3} & \frac{2}{3} & 1 & \frac{4}{3} & \frac{5}{3} \\
\end{array} \end{array}
\], \],
} }
......
...@@ -63,10 +63,10 @@ softmax or $\epsilon$-greedy strategies. ...@@ -63,10 +63,10 @@ softmax or $\epsilon$-greedy strategies.
\begin{figure*}[!t] \begin{figure*}[!t]
\centering \centering
\subfloat[2048 Game]{\includegraphics[width=3in]{pic/2048epsilon-greedy}% \subfloat[2048 Game]{\includegraphics[width=3in]{pic/2048epsilon-greedy2}%
\label{fig_second_case}} \label{fig_second_case}}
\hfil \hfil
\subfloat[Maze]{\includegraphics[width=3in]{pic/maze-eps-greedy}% \subfloat[Maze]{\includegraphics[width=3in]{pic/maze-eps-greedy2}%
\label{fig_first_case}} \label{fig_first_case}}
\caption{Comparison of returns of $\epsilon$-greedy strageties.} \caption{Comparison of returns of $\epsilon$-greedy strageties.}
\label{fig_sim} \label{fig_sim}
...@@ -78,12 +78,12 @@ To validate the above point, we designed two sets of experiments, ...@@ -78,12 +78,12 @@ To validate the above point, we designed two sets of experiments,
combined with an $\epsilon$-greedy exploration strategy, combined with an $\epsilon$-greedy exploration strategy,
testing the average score and standard deviation obtained testing the average score and standard deviation obtained
for different values $\epsilon\in$\{0, 0.001, 0.002, 0.004, for different values $\epsilon\in$\{0, 0.001, 0.002, 0.004,
0.008, 0.016, 0.032, 0.064, 0.128, 0.256, 0.512\}. 0.008, 0.016, 0.032, 0.064, 0.128, 0.256, 0.512, 0.6, 0.7, 0.8, 0.9, 1.0\}.
In the 2048 game, the value function is based on N-tuple network In the 2048 game, the value function is based on N-tuple network
trained with optimistic initialization \cite{guei2021optimistic}, trained with optimistic initialization \cite{guei2021optimistic},
achieving an average score of \highlight{300,000}. achieving an average score of {350,000}.
In the maze game, the optimal value function is used, In the maze game, the optimal value function is used,
with the optimal policy achieving a score of \highlight{-58} points. with the optimal policy achieving a score of {-54} points.
As shown in Figure \ref{fig_sim}, As shown in Figure \ref{fig_sim},
the x-axis represents $\epsilon$, the x-axis represents $\epsilon$,
the y-axis represents the average score per game, the y-axis represents the average score per game,
......
File added
File added
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment