Pavel Klavík, A. Cristiano I. Malossi, et al.
Philos. Trans. R. Soc. A
We consider a new form of decision making under uncertainty that is based on a general Markov decision process (MDP) framework devised to support opportunities to directly learn the optimal control policy. Our MDP framework extends the classical Bellman operator and optimality criteria by generalizing the definition and scope of a policy for any given state. We establish convergence and optimality results-both in general and within various control paradigms (e.g., piecewise linear control policies)-for our control-based methods through this general MDP framework, including convergence of Q-learning within the context of our MDP framework.
Pavel Klavík, A. Cristiano I. Malossi, et al.
Philos. Trans. R. Soc. A
Erik Altman, Jovan Blanusa, et al.
NeurIPS 2023
Conrad Albrecht, Jannik Schneider, et al.
CVPR 2025
Miao Guo, Yong Tao Pei, et al.
WCITS 2011