Text mining for history: First steps on building a large dataset

Suemi Higuchi; Cláudia Freitas; Bruno Cuconato; Alexandre Rademaker

LREC 2018

Conference paper

07 May 2018

Text mining for history: First steps on building a large dataset

Abstract

This paper presents the initial efforts towards the creation of a new corpus on the history domain. Motivated by the historians' need to interrogate a vast material in a non-linear way, our approach privileges deep linguistic analysis on an encyclopedic-style data. In this context, the work presented here focuses on the preparation of the corpus, which is prior to the mining activity: the morphosyntactic annotation and the definition of semantic types for entities and relations relevant to the History domain. Taking advantage of the semantic nature of appositive constructions, we manually analyzed a sample of eleven hundred sentences in order to verify its potential as additional semantic clues to be considered. The results show that we are on the right track.

Conference paper