The ConceptMapper approach to named entity recognition
Abstract
ConceptMapper is an open source tool we created for classifying mentions in an unstructured text document based on concept terminologies and yielding named entities as output. It is implemented as a UIMA1 (Unstructured Information Management Architecture (IBM, 2004)) annotator, and concepts come from standardised or proprietary terminologies. ConceptMapper can be easily configured, for instance, to use different search strategies or syntactic concepts. In this paper we will describe ConceptMapper, its configuration parameters and their trade-offs, in terms of precision and recall in identifying concepts in a collection of clinical reports written in English. ConceptMapper is available from the Apache UIMA Sandbox, using the Apache Open Source license.