BenchmarkCards: Standardized Documentation for Large Language Model BenchmarksAnna SokolElizabeth Dalyet al.2025NeurIPS 2025
Optimal Estimation of the Best Mean in Multi-Armed BanditsTakayuki OsogamiJunya Hondaet al.2025NeurIPS 2025
Causal LLM Routing: End-to-End Regret Minimization from Observational DataAsterios TsiourvasWei Sunet al.2025NeurIPS 2025
Final-Model-Only Data Attribution with a Unifying View of Gradient-Based MethodsDennis WeiInkit Padhiet al.2025NeurIPS 2025
Adaptive Distraction: Probing LLM Contextual Robustness with Automated Tree SearchYanbo WangZixiang Xuet al.2025NeurIPS 2025
Optimality and NP-Hardness of Transformers in Learning Markovian Dynamical FunctionsYanna DingSongtao Luet al.2025NeurIPS 2025
Structured Sparse Transition Matrices to Enable State Tracking in State-Space ModelsAleksandar TerzicNicolas Menetet al.2025NeurIPS 2025
Objective Soups: Multilingual Multi-Task Modeling for Speech ProcessingA SaifLisha Chenet al.2025NeurIPS 2025