KG2Tables: Your way to generate an STI benchmark for your domain
Abstract
Tabular data, often found in CSV files, is essential for data analytics workflows. Understanding this data in a semantic context, known as Semantic Table Interpretation (STI), is critical but challenging due to issues like label ambiguity. Consequently, STI has garnered significant attention in recent years. To evaluate STI systems effectively, robust benchmarks are needed. Most existing large-scale benchmarks originate from general domain sources and emphasize ambiguity, whereas domain-specific benchmarks tend to be smaller. This paper presents KG2Tables, a framework designed to create large-scale domain-specific benchmarks from a Knowledge Graph (KG). KG2Tables utilizes the internal hierarchy of relevant KG concepts and their properties. As a proof of concept, we have developed extensive datasets in the food, biodiversity, and biomedical domains. One of these datasets was used in the ISWC 2023 SemTab challenge, and the rest have been integrated into SemTab 2024.