Publication
SIAM CT 2023
Conference paper
General Markov Decision Process Framework for Directly Learning Optimal Control Policies
Abstract
We consider a new form of decision making under uncertainty that is based on a general Markov decision process (MDP) framework devised to support opportunities to directly learn the optimal control policy. Our MDP framework extends the classical Bellman operator and optimality criteria by generalizing the definition and scope of a policy for any given state. We establish convergence and optimality results-both in general and within various control paradigms (e.g., piecewise linear control policies)-for our control-based methods through this general MDP framework, including convergence of Q-learning within the context of our MDP framework.