Localizing Persona Representations in LLMs
- 2025
- AIES 2025
HI! I work at IBM Research Africa in the Nairobi lab. I am interested in trustworthy ML, especially interpretability of large generative models and regulation.
A tale of adversarial attacks & out-of-distribution detection stories in the activation space