When in Doubt, Cascade: Towards Building Efficient and Capable GuardrailsManish NagireddyInkit Padhiet al.2025AIES 2025
Multi-Level Explanations for Generative Language ModelsLucas Monteiro PaesDennis Weiet al.2025ACL 2025
Graph-based Uncertainty Metrics for Long-form Language Model GenerationsMingjian JiangYangjun Yangjunet al.2024NeurIPS 2024
Interventional Causal Discovery in a Mixture of DAGsBurak VariciDmitriy Katz-Rogozhnikovet al.2024NeurIPS 2024
WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from WikipediaYufang HouAlessandra Pascaleet al.2024NeurIPS 2024
Value Alignment from Unstructured TextInkit PadhiKarthikeyan Natesan Ramamurthyet al.2024NeurIPS 2024
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAIAmbrish RawatStefan Schoepfet al.2024NeurIPS 2024