A pattern discovery approach to retail fraud detection
Abstract
A major source of revenue shrink in retail stores is the intentional or unintentional failure of proper checking out of items by the cashier. More recently, a few automated surveillance systems have been developed to monitor cashier lanes and detect non-compliant activities such as fake item checkouts or scans done with the intention of deriving monetary benefit. These systems use data from surveillance video cameras and transaction logs (TLog) recorded at the Point-of-Sale (POS). In this paper, we present a pattern discovery based approach to detect fraudulent events at the POS. Our approach is based on mining time-ordered text streams, representing retail transactions, formed from a combination of visually detected checkout related activities called primitives and barcodes from TLog data. Patterns representing single item checkouts, i.e. anchored around a single barcode, are discovered from these text streams using an efficient pattern discovery technique called Teiresias. The discovered patterns are used to build models for true and fake item scans by retaining or discarding the anchoring barcodes in those patterns respectively. A pattern matching and classification scheme is designed to robustly detect non-compliant cashier activities in the presence of noise in either the TLog or the video data. Different weighting schemes for quantifying the relative importance of the discovered patterns are explored: Frequency, Support Vector Machine (SVM) and Frequency+SVM. Using a large scale dataset recorded from retail stores, our approach discovers semantically meaningful cashier scan patterns. Our experiments also suggest that different weighting schemes result in varied false and true positive performances on the task of fake scan detection. Copyright 2011 ACM.