Derivatives of logarithmic stationary distributions for policy gradient reinforcement learningTetsuro MorimuraEiji Uchibeet al.2010Neural Computation
Natural actor-critic with baseline adjustment for variance reductionTetsuro MorimuraEiji Uchibeet al.2008Artificial Life and Robotics