Publication
eScience 2014
Conference paper

Exploratory Information Extraction from a Historical Dictionary

View publication

Abstract

We describe a preliminary project of extracting information from an extant dictionary of historical biographies, the 'Dicionário Histórico-Biográfico Brasileiro' (the Brazilian Historical and Biographical Dictionary, shortened as DHBB), a longstanding project at the 'Centro de Pesquisa e Documentação de História Contemporânea do Brasil' (CPDOC) of the Fundação Getulio Vargas (FGV). For information extraction, we rely on Natural Language Processing tools such as FreeLing as well as our resources NomLex-PT, a lexicon of nominalizations, and OpenWN-PT, a Portuguese version of Princeton's WordNet database. While our project currently highlights the potential of information extraction in a fun exploratory manner, we also discuss the engaging of historians interested in the affordances of digital tools.