Taku Ito, Luca Cocchi, et al.
ICML 2025
Large language models (LLMs) and Generative AI (GenAI) are at the forefront of frontier AI research and technology. With their rapidly increasing popularity and availability, challenges and concerns about their misuse and safety risks are becoming more prominent than ever. In this talk, we introduce a unified computational framework for evaluating and improving a wide range of safety challenges in generative AI. Specifically, we will show new tools and insights to explore and mitigate the safety and robustness risks associated with state-of-the-art LLMs and GenAI models, including (i) safety risks in fine-tuning LLMs, (ii) LLM jailbreak mitigation, (iii) prompt engineering for safety debugging, and (iv) robust detection of AI-generated content.
Taku Ito, Luca Cocchi, et al.
ICML 2025
Abhishek Aich, Akash Gupta, et al.
CVPR 2020
Saiteja Utpala, Alex Gu, et al.
NAACL 2024
Kristjan Greenewald, Yuancheng Yu, et al.
NeurIPS 2024