Adaptive Distraction: Probing LLM Contextual Robustness with Automated Tree SearchYanbo WangZixiang Xuet al.2025NeurIPS 2025
BenchmarkCards: Standardized Documentation for Large Language Model BenchmarksAnna SokolElizabeth Dalyet al.2025NeurIPS 2025