Automatic taxonomy generation: Issues and possibilities
Raghu Krishnapuram, Krishna Kummamuru
IFSA 2003
This paper develops a unified Lyapunov framework for finite-sample analysis of a Markovian stochastic approximation (SA) algorithm under a contraction operator with respect to an arbitrary norm. The main novelty lies in the construction of a valid Lyapunov function called the generalized Moreau envelope. The smoothness and an approximation property of the generalized Moreau envelope enable us to derive a one-step Lyapunov drift inequality, which is the key to establishing the finite-sample bounds. Our SA result has wide applications, especially in the context of reinforcement learning (RL). Specifically, we show that a large class of value-based RL algorithms can be modeled in the exact form of our Markovian SA algorithm. Therefore, our SA results immediately imply finite-sample guarantees for popular RL algorithms such as n-step temporal difference (TD) learning, TD(λ), off-policy V-trace, and Q-learning. As byproducts, by analyzing the convergence bounds of n-step TD and TD(λ), we provide theoretical insight into the problem about the efficiency of bootstrapping. Moreover, our finite-sample bounds of off-policy Vtrace explicitly capture the tradeoff between the variance of the stochastic iterates and the bias in the limit.
Raghu Krishnapuram, Krishna Kummamuru
IFSA 2003
Xinyi Su, Guangyu He, et al.
Dianli Xitong Zidonghua/Automation of Electric Power Systems
Arun Viswanathan, Nancy Feldman, et al.
IEEE Communications Magazine
Marshall W. Bern, Howard J. Karloff, et al.
Theoretical Computer Science