Table Retrieval using LLMs and Semantic Table Similarity
Abstract
Searching relevant tables in response to a textual phrase or a question is an important ask for large tabular data repositories such as relational databases, CSV files in datalakes, etc. It is somewhat different to the problem of web document search because the subjects of search are tables instead of the documents, while the query still being textual. In this paper, we explore a novel technique for table search on large repositories using natural language queries. It is based on a generative methodology while trying to maximize the semantic connection between the query and the resulting tables. Unlike traditional keyword search approaches, our technique is able to find the needed tables more effectively due to the deeper semantic concept discovery instead of simply searching for exact keyword matches. Additionally, we support natural language queries instead of plain keyword queries. In this paper, we describe the core ideas, implementation and the effectiveness of our method using two different benchmarks with diverse queries.