Taming Communication and Sample Complexities in Decentralized Policy Evaluation for Cooperative Multi-Agent Reinforcement LearningXin ZhangZhuqing Liuet al.2021NeurIPS 2021
PILOT: An O (1/K)-Convergent Approach for Policy Evaluation with Nonlinear Function ApproximationZhuqing LiuXin Zhanget al.2024ICLR 2024