\section{Conclusion and Future Work}
Value-based reinforcement learning typically aims 
to minimize error as an optimization objective. 
As an alternation, this study proposes new objective 
functions: VBE and VPBE, and derives many variance minimization algorithms, including VMTD, 
VMTDC and VMETD. 
% The VMTD algorithm 
% is essentially an adjustment or correction to the traditional 
% TD update. 
%  Both 
% algorithms are capable of stabilizing gradient estimation, reducing 
% the variance of gradient estimation and accelerating convergence.
All algorithms demonstrated superior performance in policy 
evaluation and control experiments.
Future work may include, but are not limited
to, (1) analysis of the convergence rate of VMTDC and VMETD. 
(2) extensions of VBE and VPBE to multi-step returns. 
(3) extensions to nonlinear approximations, such as neural networks.