Fan Zhang, Junwei Cao, et al.
IEEE TETC
In this paper, we examine the application of various grouping techniques to help improve the efficiency and reduce the costs involved in an electronic discovery process. Specifically, we create coherent groups of email documents which characterize either a syntactic theme, a semantic theme or an email thread. All such grouped documents can be reviewed together leading to a faster and more consistent review of documents. Syntactic grouping of emails is based on near duplicate detection whereas semantic grouping is based on identifying concepts in the email content using information extraction. Email thread detection is achieved using a combination of segmentation and near duplicate detection. We present experimental results on the Enron corpus that suggest that these approaches can significantly reduce the review time and show that high precision and recall in identifying the groups can be achieved. We also describe how these techniques are integrated into the IBM eDiscovery Analyzer product offering. © 2011 VLDB Endowment.
Fan Zhang, Junwei Cao, et al.
IEEE TETC
William Hinsberg, Joy Cheng, et al.
SPIE Advanced Lithography 2010
Preeti Malakar, Thomas George, et al.
SC 2012
Daniel M. Bikel, Vittorio Castelli
ACL 2008