The RealHumanEval: Evaluating Large Language Models’ Abilities to Support ProgrammersHussein MozannarValerie Chenet al.2025TMLR
Who Should Predict? Exact Algorithms For Learning to Defer to HumansHussein MozannarHunter Langet al.2023AISTATS 2023