MILU: A Multi-task Indic Language Understanding BenchmarkSshubam VermaMohammed Safi Ur Rahmanet al.2025NAACL 2025
Exploring Straightforward Methods for Automatic Conversational Red-TeamingGeorge KourNaama Zwerdlinget al.2025NAACL 2025
CodeGenWrangler: Data Wrangling task automation using Code-Generating ModelsAkella AshleshaAbhijit Manatkaret al.2025NAACL 2025
Schema and Natural Language Aware In-Context Learning for Improved GraphQL Query GenerationNitin GuptaManish Kesarwaniet al.2025NAACL 2025
Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You InItay NakashGeorge Kouret al.2025NAACL 2025
Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language ModelsSonam MishraYatin Nandwaniet al.2025NAACL 2025
Challenges and Remedies of Domain-Specific Classifiers as LLM Guardrails: Self-Harm as a Case StudyBing ZhangGuang-Jie Ren2025NAACL 2025
Benchmarking and Building Zero-Shot Hindi Retrieval Model with Hindi-BEIR and NLLB-E5Arkadeep AcharyaRudra Murthy Venkataramanaet al.2025NAACL 2025
Enterprise Benchmarks for Large Language Model EvaluationBing ZhangMikio Takeuchiet al.2025NAACL 2025