MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation SystemsYannis KatsisSara Rosenthalet al.2025ACL 2025
Query-driven Document-level Scientific Evidence Extraction from Biomedical StudiesMassimiliano PronestiJoao Bettencourt-Silvaet al.2025ACL 2025
Defensive Prompt Patch: A Robust and Generalizable Defense of Large Language Models against Jailbreak AttacksChen XiongXiangyu Qiet al.2025ACL 2025
ZeroNER: Fueling Zero-Shot Named Entity Recognition via Entity Type DescriptionsAlessio CocchieriMarcos Martínez Galindoet al.2025ACL 2025
Knowledge Base Construction for Knowledge-Augmented Text-to-SQLJinheon BaekHorst Samulowitzet al.2025ACL 2025
PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool PlayWei FangYang Zhanget al.2025ACL 2025
“You are Beautiful, Body Image Stereotypes are Ugly!” BIStereo: A Benchmark to Measure Body Image Stereotypes in Language ModelsNarjis AsadNihar Ranjan Sahooet al.2025ACL 2025
Stereotype Detection as a Catalyst for Enhanced Bias Detection: A Multi-Task Learning ApproachAditya TomarRudra Murthy Venkataramanaet al.2025ACL 2025
Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational AgentsIvoline NgongSwanand Ravindra Kadheet al.2025ACL 2025