Conference paper

Going Topological in Multi-risk Extended Markov Ratio Decision Processes

Abstract

Incorporating risk into decision making is natural, if one is to address safety concerns or operational limitations. In the context of risk-aware Markov Decision Processes (MDPs), one identifies a notion of risk which is uncertainty driven (e.g., CVaR). Risk, however, may also be inherent to the MDP setup itself, e.g., to taking certain actions. In that case, we would consider a decision policy to be better, if it either increases reward or reduces risk (or both). A simple mathematical formulation that expresses such a notion of improvement is the ratio of reward over risk. Though intuitive, this ratio is inherently non-linear, which introduces challenges for optimization. We provide an algorithm that solves this non-linear problem in the context of multiple risk aspects, extending upon single-risk Extended Markov Ratio Decision Processes (EMRDPs). We show that it is strongly polynomial under a monotonicity assumption over actions, satisfied, for example, in financial market applications (e.g., Quasi-Sharpe Ratio). We tackle non-linearity by integrating Walkup-Wets’ topological view of parametric LPs. This topological framework highlights the non-trivial move from a single (EMRDP) to multiple risk aspects, once it is interpreted as moving from triangulations of 1-dimensional to those of m-dimensional polyhedra, with all the topological (and combinatorial) complexities this entails.