Derivatives of logarithmic stationary distributions for policy gradient reinforcement learningTetsuro MorimuraEiji Uchibeet al.2010Neural Computation