SocialStigmaQA: A Benchmark to Uncover Stigma Amplification in Generative Language ModelsManish NagireddyLamogha Chiazoret al.2024AAAI 2024
Value Alignment from Unstructured TextInkit PadhiKarthikeyan Natesan Ramamurthyet al.2024NeurIPS 2024
SocialStigmaQA Spanish and Japanese - Towards Multicultural Adaptation of Social Bias BenchmarksClara Higuera CabañesRyo Iwakiet al.2024NeurIPS 2024
Influence Based Approaches to Algorithmic Fairness: A Closer LookSoumya GhoshPrasanna Sattigeriet al.2023NeurIPS 2023
Simulating Iterative Human-AI Interaction in Programming with LLMsHussein MozannarValerie Chenet al.2023NeurIPS 2023
Prompt Templates: A Methodology for Improving Manual Red Teaming PerformanceBrandon DominiqueDavid Piorkowskiet al.2024CHI 2024
Language Models in Dialogue: Conversational Maxims for Human-AI InteractionsErik MiehlingManish Nagireddyet al.2024EMNLP 2024
DARE to Diversify: DAta Driven and Diverse LLM REd TeamingManish NagireddyBernat Guillen Pegueroleset al.2024KDD 2024