Practical Contextual Bandits for Large-Scale Structured Discrete Constrained Optimization Problems

Pavithra Harsha; Chitra Subramanian; Naoki Abe; Shivaram Subramanian; Amadou Ba; Kevin Arturo Fernández Román; Mauricio Longinos Garrido; Chandrasekhar Narayanaswami

doi:10.1145/3711896.3737086

KDD 2025

Conference paper

03 Aug 2025

Practical Contextual Bandits for Large-Scale Structured Discrete Constrained Optimization Problems

View publication

Abstract

We address the contextual bandit problem for high-dimensional combinatorial action spaces involving a class of structured discrete constrained optimization problems. In this setting, key quantities within the optimization formulation, such as performance indicators contributing to the objective or constraints, must be estimated from data during the trial sequence of optimized actions. These problems frequently arise in the many operations management domains including IT resource allocation and retail assortment price optimization. We propose a novel, practical, and transparent approach based on general-purpose regression oracles, leveraging Inverse Gap Weighting (IGW) for seamless integration within an optimization framework. IGW sampling is efficiently managed by: (a) a column generation reformulation of the underlying Mixed Integer Programming (MIP) model which allows for flexible lower-level predictors, causal coherence, and efficient representation of large action spaces; (b) a diverse solution pool generation to balance the exploration-exploitation trade-off in large action spaces. To address non-smoothness in the reward function due to optimization constraints, we incorporate a risk-averse phased learning strategy. We validate our approach on a real-world auto-scaling problem in IT automation, achieving a significant reduction in cumulative regret through skillful exploration, with additional gains from risk-averse methods that effectively manage constraint violations.

Paper