View all topics

Explainable AI

To trust AI systems, explanations can go a long way. We’re creating tools to help debug AI, where systems can explain what they’re doing. This includes training highly optimized, directly interpretable models, as well as explanations of black-box models and visualizations of neural network information flows.

Our work

Debugging LLMs to improve their credibility
Research
Kim Martineau
30 Jul 2025
Teaching AI models to improve themselves
Research
Peter Hess
14 Aug 2024
IBM and RPI researchers demystify in-context learning in large language models
News
Peter Hess
25 Jul 2024
The latest AI safety method is a throwback to our maritime past
Research
Kim Martineau
16 Nov 2023
Find and fix IT glitches before they crash the system
News
Kim Martineau
28 Sep 2023
What is retrieval-augmented generation?
Explainer
Kim Martineau
22 Aug 2023
See more of our work on Explainable AI

Publications

3rd TrustAI Workshop: Building Public Awareness and Engagement
- - Miriam Rateike
  - Brian Mboya
  - et al.
- 2025
- DLI 2025
Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents
- - Ivoline Ngong
  - Swanand Ravindra Kadhe
  - et al.
- 2025
- ACL 2025
Multi-Level Explanations for Generative Language Models
- - Lucas Monteiro Paes
  - Dennis Wei
  - et al.
- 2025
- ACL 2025
BI-Bench : A Comprehensive Benchmark Dataset and Unsupervised Evaluation for BI Systems
- - Ankush Gupta
  - Aniya Aggarwal
  - et al.
- 2025
- ACL 2025
NGQA: A Nutritional Graph Question Answering Benchmark for Personalized Health-aware Nutritional Reasoning
- - Zheyuan Zhang
  - Yiyang Li
  - et al.
- 2025
- ACL 2025
Think Again! The Effect of Test-Time Compute on Preferences, Opinions, and Beliefs of Large Language Models
- - George Kour
  - Itay Nakash
  - et al.
- 2025
- ACL 2025

View all publications

Related topics