Aditya Malik, Nalini Ratha, et al.
CAI 2024
We study contextual bandit algorithms for high-dimensional combinatorial action spaces involving a class of structured discrete constrained optimization. We propose a novel approach based on general-purpose regression oracles, leveraging Inverse Gap Weighting (IGW) for seamless integration within an optimization framework. IGW sampling is efficiently managed by: (a) Column generation reformulation of the underlying Mixed Integer Programming (MIP) model, enabling flexible lower-level predictors, causal coherence, and efficient representation of large action spaces. (b) Diverse solution pool generation to balance the exploration-exploitation trade-off in large action spaces. To address discontinuities in the reward function’s gradient due to optimization constraints, we incorporate a risk-averse phased learning strategy. Through ablation studies, we demonstrate that each component is critical for maximizing cumulative rewards. Finally, we validate the practical viability of our approach on a real-world Auto-Scaling problem in IT automation.
Aditya Malik, Nalini Ratha, et al.
CAI 2024
Leonid Karlinsky, Joseph Shtok, et al.
CVPR 2019
Erik Altman, Jovan Blanusa, et al.
NeurIPS 2023
Pavel Klavík, A. Cristiano I. Malossi, et al.
Philos. Trans. R. Soc. A