Efficient Distributionally Robust Learning via Unbiased Gradient Estimation
Abstract
Seeking to improve generalization of mathematical learning models, we consider a new statistical learning approach called distributionally robust learning (DRL) that utilizes difficult-to-solve minimax formulations. We devise a new algorithm to solve these formulations that applies stochastic gradient descent (SGD) to the outer minimization problem. Our algorithm efficiently estimates the gradient of the inner maximization problem through multi-level Monte Carlo randomization. We establish theoretical results that shed light on why standard gradient estimators fail, and then leverage these theoretical results to establish the optimal parameterization of the gradient estimators of our approach that balances a fundamental tradeoff between computation time and stochastic error. We rigorously establish convergence to a near-optimal solution under standard regularity assumptions and, for strongly convex losses, match the best known $O(\epsilon^{-1})$ rate of convergence of SGD for standard stochastic optimization. Computational experiments demonstrate that our DRL approach yields significant benefits over previous work.