\documentclass{article} % if you need to pass options to natbib, use, e.g.: % \PassOptionsToPackage{numbers, compress}{natbib} % before loading neurips_2024 % ready for submission \usepackage{neurips_2024} % to compile a preprint version, e.g., for submission to arXiv, add add the % [preprint] option: % \usepackage[preprint]{neurips_2024} % to compile a camera-ready version, add the [final] option, e.g.: % \usepackage[final]{neurips_2024} % to avoid loading the natbib package, add option nonatbib: % \usepackage[nonatbib]{neurips_2024} \usepackage[utf8]{inputenc} % allow utf-8 input \usepackage[T1]{fontenc} % use 8-bit T1 fonts \usepackage{hyperref} % hyperlinks \usepackage{url} % simple URL typesetting \usepackage{booktabs} % professional-quality tables \usepackage{amsfonts} % blackboard math symbols \usepackage{nicefrac} % compact symbols for 1/2, etc. \usepackage{microtype} % microtypography \usepackage{xcolor} % colors \usepackage{graphicx} \usepackage{subfigure} \usepackage{diagbox} \usepackage{wrapfig} \usepackage{booktabs} \usepackage{amsmath} \usepackage{amssymb} \usepackage{mathtools} \usepackage{amsthm} \usepackage{tikz} \usepackage{bm} \usepackage{esvect} \usepackage{multirow} \theoremstyle{plain} \newtheorem{theorem}{Theorem}[section] \newtheorem{proposition}[theorem]{Proposition} \newtheorem{lemma}[theorem]{Lemma} \newtheorem{corollary}[theorem]{Corollary} \theoremstyle{definition} \newtheorem{definition}[theorem]{Definition} \newtheorem{assumption}[theorem]{Assumption} \theoremstyle{remark} \newtheorem{remark}[theorem]{Remark} \usepackage{algorithm} \usepackage{algorithmic} \title{Is Minimizing Errors the Only Option for Value-based Reinforcement Learning?} % The \author macro works with any number of authors. There are two commands % used to separate the names and addresses of multiple authors: \And and \AND. % % Using \And between authors leaves it to LaTeX to determine where to break the % lines. Using \AND forces a line break at that point. So, if LaTeX puts 3 of 4 % authors names on the first line, and the last on the second line, try using % \AND instead of \And before the third author name. \author{% David S.~Hippocampus\thanks{Use footnote for providing further information about author (webpage, alternative address)---\emph{not} for acknowledging funding agencies.} \\ Department of Computer Science\\ Cranberry-Lemon University\\ Pittsburgh, PA 15213 \\ \texttt{hippo@cs.cranberry-lemon.edu} \\ % examples of more authors % \And % Coauthor \\ % Affiliation \\ % Address \\ % \texttt{email} \\ % \AND % Coauthor \\ % Affiliation \\ % Address \\ % \texttt{email} \\ % \And % Coauthor \\ % Affiliation \\ % Address \\ % \texttt{email} \\ % \And % Coauthor \\ % Affiliation \\ % Address \\ % \texttt{email} \\ } \begin{document} \maketitle \begin{abstract} The existing research on value-based reinforcement learning also minimizes the error. However, is error minimization really the only option for value-based reinforcement learning? We can easily observe that the policy on action choosing probabilities is often related to the relative values, and has nothing to do with their absolute values. Based on this observation, we propose the objective of variance minimization instead of error minimization, derive many new variance minimization algorithms, both including a traditional parameter $\omega$, and conduct an analysis of the convergence rate and experiments. The experimental results show that our proposed variance minimization algorithms converge much faster. \end{abstract} \input{main/introduction.tex} \input{main/preliminaries.tex} \input{main/motivation.tex} \input{main/theory.tex} \input{main/experiment.tex} \input{main/relatedwork.tex} \input{main/conclusion.tex} \appendix \input{main/appendix.tex} \bibliographystyle{named} \bibliography{neurips_2024} % \bibliographystyle{neurips_2024} \end{document}