Actor-critic algorithms for constrained multi-agent reinforcement learning
Abstract
Multi-agent reinforcement learning has gained lot of popularity primarily owing to the success of deep function approximation architectures. However, many real-life multi-agent applications often impose constraints on the joint action sequence that can be taken by the agents. In this work, we formulate such problems in the framework of constrained cooperative stochastic games. Under this setting, the goal of the agents is to obtain joint action sequence that minimizes a total cost objective criterion subject to total cost penalty/budget functional constraints. To this end, we utilize the Lagrangian formulation and propose actor-critic algorithms. Through experiments on a constrained multi-agent grid world task, we demonstrate that our algorithms converge to near-optimal joint action sequences satisfying the given constraints.