\section{Conclusion and Future Work} Value-based reinforcement learning typically aims to minimize error as an optimization objective. As an alternation, this study proposes new objective functions: VBE and VPBE, and derives many variance minimization algorithms, including VMTD, VMTDC and VMETD. % The VMTD algorithm % is essentially an adjustment or correction to the traditional % TD update. % Both % algorithms are capable of stabilizing gradient estimation, reducing % the variance of gradient estimation and accelerating convergence. All algorithms demonstrated superior performance in policy evaluation and control experiments. Future work may include, but are not limited to, (1) analysis of the convergence rate of VMTDC and VMETD. (2) extensions of VBE and VPBE to multi-step returns. (3) extensions to nonlinear approximations, such as neural networks.