General Markov Decision Process Framework for Directly Learning Optimal Control Policies

Yingdong Lu; Mark Squillante; Chai Wah Wu

SIAM CT 2023

Conference paper

24 Jul 2023

General Markov Decision Process Framework for Directly Learning Optimal Control Policies

Abstract

We consider a new form of decision making under uncertainty that is based on a general Markov decision process (MDP) framework devised to support opportunities to directly learn the optimal control policy. Our MDP framework extends the classical Bellman operator and optimality criteria by generalizing the definition and scope of a policy for any given state. We establish convergence and optimality results-both in general and within various control paradigms (e.g., piecewise linear control policies)-for our control-based methods through this general MDP framework, including convergence of Q-learning within the context of our MDP framework.

Talk