Challenges and Remedies of Domain-Specific Classifiers as LLM Guardrails: Self-Harm as a Case StudyBing ZhangGuang-Jie Ren2025NAACL 2025
Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous InferenceMatthew RiemerGopeshh Subbarajet al.2025ICLR 2025
Explain Yourself, Briefly! Self-Explaining Neural Networks with Concise Sufficient ReasonsShahaf BassanRon Eliavet al.2025ICLR 2025
DELIFT: DATA EFFICIENT LANGUAGE MODEL INSTRUCTION FINE-TUNINGIshika AgarwalKrishnateja Killamsettyet al.2025ICLR 2025
A new framework for evaluating model out-of-distribution generalisation for the biochemical domainRaúl Fernández DíazLam Thanh Hoanget al.2025ICLR 2025
Precedence-Constrained Winter Value for Effective Graph Data ValuationHongling ChiWei Jinet al.2025ICLR 2025