Multi-lingual concept extraction with linked data and human-in-the-loop

Alfredo Alba; Anni Coden; Anna Lisa Gentile; Daniel Gruhl; Petar Ristoski; Steve Welch

doi:10.1145/3148011.3148021

K-CAP 2017

Conference paper

04 Dec 2017

Multi-lingual concept extraction with linked data and human-in-the-loop

View publication

Abstract

Ontologies are dynamic artifacts that evolve both in structure and content. Keeping them up-to-date is a very expensive and critical operation for any application relying on semantic Web technologies. In this paper we focus on evolving the content of an ontology by extracting relevant instances of ontological concepts from text. We propose a novel technique which is (i) completely language independent, (ii) combines statistical methods with human-in-theloop and (iii) exploits Linked Data as bootstrapping source. Our experiments on a publicly available medical corpus and on a Twitter dataset show that the proposed solution achieves comparable performances regardless of language, domain and style of text. Given that the method relies on a human-in-the-loop, our results can be safely fed directly back into Linked Data resources.

Short paper