A new document masking approach for removing confidential information
Abstract
In order to protect confidential information such as personal and organizational information written as text, document masking techniques are becoming important. Such document masking methods extract humans, places, and organization names automatically and remove them, so they make documents harmless and allow sharing them safely within an organization, and contribute to improving productivity. However, existing automatic document masking techniques are not reliable enough since they may fail to mask out-of-vocabulary proper nouns. In this paper we propose a novel technique for document masking, the Unmasking Method, in which all of the words are hidden initially and a human specifies the non-confidential words to be unmasked. The proposed method is a high-safety document masking method since it unmasks only words that a human has manually recognized as safe. Our experimental results show its safety and effectiveness. © 2007 IEEE.