RL Tango: Reinforcing Generator and Verifier Together for Language ReasoningKaiwen ZhaZhengqi Gaoet al.2025NeurIPS 2025
Thermometer: Towards Universal Calibration for Large Language ModelsMaohao ShenSubhro Daset al.2024ICML 2024