Publication
TAGA 2004
Conference paper

Document segmentation with application for the book publishing industry

Abstract

We describe a method of segmenting a scanned page into Text, Image, Line-Art and background. Each segment undergoes specific image processing and compression routines, based on its type, and the document is then reassembled as in the original page. This procedure improves the print quality of the document, being as close as possible to the paper original, and eliminates artifacts that would otherwise result in printing a scanned document. Moreover, the disparate compression algorithms yield a reduced size file, improving performance in printers, servers, and networks.