IBM at VLDB 2025

About

IBM is proud to sponsor the International Conference on Very Large Data Bases 2025 (VLDB 2025).

VLDB is a premier annual international forum for data management, scalable data science, and database researchers, vendors, practitioners, application developers, and users. The forthcoming VLDB 2025 conference is poised to deliver a comprehensive program, featuring an array of research talks, keynote and invited talks, panels, tutorials, demonstrations, industrial tracks, and workshops. It will cover a spectrum of research topics related to all aspects of data management, where systems issues play a significant role, such as data management system technology and information management infrastructures, including their very large scale of experimentation, novel architectures, and demanding applications as well as their underpinning theory.

Visit us at our Sponsor table to learn more about our work at IBM Research. IBM speaking sessions & presentations can be found in the agenda section below.

Agenda

  • Description:

    Grounding LLMs for Database Exploration: Intent Scoping and Paraphrasing for Robust NL2SQL

    Catalina Dragusin (ETH Zurich); Katsiaryna Mirylenka (Zalando SE); Christoph Miksovic (IBM Research); Michael Glass (IBM Research); Nahuel Defosse (IBM Research); Paolo Scotton (IBM Research); Thomas Gschwind (IBM Research)

  • Description:

    Bootstrapping Learned Cost Models with Synthetic SQL Queries

    Michael Nidd (IBM Research); Christoph Miksovic (IBM Research); Thomas Gschwind (IBM Research); Francesco Fusco (IBM Research); Andrea Giovannini (IBM Research); Ioana Giurgiu (IBM Research)

  • Description:

    A knowledge graph (KG) represents a network of entities and illustrates relationships between them. KGs are used for various applications, including semantic search and discovery, reasoning, decision making, natural language processing, machine learning, and recommendation systems. Automatic KG construction from text is an active research area. Triple (subject-relation-object) extraction from text is the fundamental block of KG construction and has been widely studied since early benchmarks such as ACE 2002 to more recent ones such as WebNLG 2020, REBEL and SynthIE. There has also been a number of works in the last few years exploiting LLMs for KG construction. However, handcrafting reasonable task-specific prompts for LLMs is a labour-intensive task and is subject to being brittle to the changes in LLM models. Recent work in various NLP tasks (e.g. autonomy generation) using automatic prompt optimisation/engineering addresses this challenge by generating optimal or near-optimal task-specific prompts given input-output examples.

    This empirical study explores the application of automatic prompt optimisation for the triple extraction task using experimental benchmarking. We evaluate different settings by changing (a) the prompting strategy, (b) the LLM being used for prompt optimisation and task execution, (c) number of canonical relations in the schema (schema complexity), (d) the length and diversity of input text, (e) the metric used to drive the prompt optimization, and (f) the dataset being used for training and testing. We evaluate three different automatic prompt optimizers, namely, DSPy, APE, and TextGrad and use two different triple extraction datasets, SynthIE and REBEL. Our main contribution is to show that automatic prompt optimisation techniques can generate reasonable prompts similar to humans for triple extraction and achieve improved results, with significant gains observed as text size and schema complexity increase.

    Authors:
    NM
    Nandana Mihindukulasooriya
    IBM
    HS
    Horst Samulowitz
    IBM

Upcoming events