Distributionally robust policy evaluation and learning in offline contextual banditsNian SiFan Zhanget al.2020ICML 2020
Online EXP3 learning in adversarial bandits with delayed feedbackIlai BistritzZhengyuan Zhouet al.2019NeurIPS 2019