Generalization of ERM in stochastic convex optimization: The dimension Strikes Back
Abstract
In stochastic convex optimization the goal is to minimize a convex function F(x) = Ef∼D[f(x)] over a convex set K ⊂ ℝd where D is some unknown distribution and each f(·) in the support of D is convex over K. The optimization is commonly based on i.i.d. samples f1, f2,⋯, fn from D. A standard approach to such problems is empirical risk minimization (ERM) that optimizes FS(x) = 1/n Σi≤n fi(x). Here we consider the question of how many samples are necessary for ERM to succeed and the closely related question of uniform convergence of FS to F over K. We demonstrate that in the standard lp/lq setting of Lipschitz-bounded functions over a K of bounded radius, ERM requires sample size that scales linearly with the dimension d. This nearly matches standard upper bounds and improves on Ω(log d) dependence proved for l2/l2 setting in [18]. In stark contrast, these problems can be solved using dimension-independent number of samples for l2/l2 setting and log d dependence for l1/l∞ setting using other approaches. We further show that our lower bound applies even if the functions in the support of D are smooth and efficiently computable and even if an l1 regularization term is added. Finally, we demonstrate that for a more general class of bounded-range (but not Lipschitz-bounded) stochastic convex programs an infinite gap appears already in dimension 2.