Publications

Adaptive step-size policy gradients with average reward metric