\section{Conclusion and Future Work} % Value-based reinforcement learning typically aims % to minimize error as an optimization objective. % As an alternation, this study proposes new objective % functions: VBE and VPBE, and derives many variance minimization algorithms, including VMTD, % VMTDC and VMETD. % All algorithms demonstrated superior performance in policy % evaluation and control experiments. % Future work may include, but are not limited % to, (1) analysis of the convergence rate of VMTDC and VMETD. % (2) extensions of VBE and VPBE to multi-step returns. % (3) extensions to nonlinear approximations, such as neural networks. Value-based reinforcement learning typically aims to minimize error as an optimization objective. As an alternation, this study proposes two new objective functions: VBE and VPBE, and derives an on-policy algorithm: VMTD and two off-policy algorithms: VMTDC and VMETD. % The VMTD algorithm % is essentially an adjustment or correction to the traditional % TD update. % Both % algorithms are capable of stabilizing gradient estimation, reducing % the variance of gradient estimation and accelerating convergence. All algorithms demonstrated superior performance in policy evaluation and control experiments. Both algorithms demonstrated superior performance in policy evaluation and control experiments. Future work may include, but are not limited to, \begin{itemize} \item analysis of the convergence rate of VMTDC and VMETD. \item extensions of VBE and VPBE to multi-step returns. \item extensions to nonlinear approximations, such as neural networks. \end{itemize}