LLMGuard: Guarding against Unsafe LLM Behavior

Shubh Goyal; Medha Hira; Shubham Mishra; Sukriti Goyal; Arnav Goel; Niharika Dadu; D.B. Kirushikesh; Sameep Mehta; Nishtha Madaan

doi:10.1609/aaai.v38i21.30566

AAAI 2024

Conference paper

25 Mar 2024

LLMGuard: Guarding against Unsafe LLM Behavior

Download paper

Abstract

Although the rise of Large Language Models (LLMs) in enterprise settings brings new opportunities and capabilities, it also brings challenges, such as the risk of generating inappropriate, biased, or misleading content that violates regulations and can have legal concerns. To alleviate this, we present ”LLMGuard”, a tool that monitors user interactions with an LLM application and flags content against specific behaviours or conversation topics. To do this robustly, LLMGuard employs an ensemble of detectors.

Workshop paper