Evaluating the Impact of Skin Tone Representation on Out-of-Distribution Detection Performance in DermatologyAssala BenmalekCelia Cintaset al.2024ISBI 2024
Navigating the Modern Evaluation Landscape: Considerations in Benchmarks and Frameworks for Large Language Models (LLMs)Leshem ChoshenAriel Geraet al.2024LREC-COLING 2024
Using Large Language Models to Understand Suicidality in a Social Media–Based Taxonomy of Mental Health Disorders: Linguistic Analysis of Reddit PostsBrian BauerRaquel Norelet al.2024JMIR Mental Health
RELIC: Investigating Large Language Model Responses using Self-ConsistencyFurui ChengVilém Zouharet al.2024CHI 2024
The Who in XAI: How AI Background Shapes Perceptions of AI ExplanationsUpol EhsanSamir Passiet al.2024CHI 2024
Expedient Assistance and Consequential Misunderstanding: Envisioning an Operationalized Mutual Theory of MindJustin WeiszMichael Mulleret al.2024CHI 2024
Facilitating Human-LLM Collaboration through Factuality Scores and Source AttributionsHyo Jin DoRachel Ostrandet al.2024CHI 2024
Prompt Templates: A Methodology for Improving Manual Red Teaming PerformanceBrandon DominiqueDavid Piorkowskiet al.2024CHI 2024