FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in LLMsSwanand Ravindra KadheAnisa Halimiet al.2023NeurIPS 2023
Cost-Aware Counterfactuals for Black Box ExplanationsNatalia Martinez GilKanthi Sarpatwaret al.2023NeurIPS 2023
Influence Based Approaches to Algorithmic Fairness: A Closer LookSoumya GhoshPrasanna Sattigeriet al.2023NeurIPS 2023
Weakly Supervised Detection of Hallucinations in LLM ActivationsMiriam RateikeCelia Cintaset al.2023NeurIPS 2023
Subtle Misogyny Detection and Mitigation: An Expert-Annotated DatasetAnna RichterBrooklyn Sheppardet al.2023NeurIPS 2023
Beyond Black Box AI-Generated Plagiarism Detection: From Sentence to Document LevelMujahid Ali QuidwaiChunhui Liet al.2023ACL 2023
Benchmarking the Effect of Poisoning Defenses on the Security and Bias of Deep Learning ModelsNathalie Baracaldo AngelFarhan Ahmedet al.2023S&P 2023
Connecting Underrepresented Minorities and Qualified Job Positions Using Online DataMaysa Malfiza Garcia de MacedoMarisa Affonso Vasconceloset al.2021AAAI 2021
Keeping Up with the Language Models: Robustness-Bias Interplay in NLI Data and ModelsIoana Baldini SoaresChhavi Yadavet al.2023ACL 2023