Hongliang Li, Bo Zhang, et al.
IBM J. Res. Dev
In this paper, we present error bound analysis of the Q-function for the action-dependent adaptive dynamic programming for solving discounted optimal control problems of unknown discrete-time nonlinear systems. The convergence of Q-functions derived by a policy iteration algorithm under ideal conditions is given. Considering the approximated errors of the Q-function and control policy in the policy evaluation step and policy improvement step, we establish error bounds of approximate Q-functions in each iteration. With the given boundedness conditions, the approximate Q-function will converge to a finite neighborhood of the optimal Q-function. To implement the presented algorithm, two three-layer neural networks are employed to approximate the Q-function and the control policy, respectively. Finally, a simulation example is utilized to verify the validity of the presented algorithm.
Hongliang Li, Bo Zhang, et al.
IBM J. Res. Dev
Ding Wang, Derong Liu, et al.
IEEE Transactions On SMC: Systems
Hongliang Li, Miao He, et al.
IJCNN 2016
Ding Wang, Derong Liu, et al.
IEEE Transactions On SMC: Systems