Optimizing queries across diverse data sources
Abstract
Businesses today need to interrelate data stored in diverse systems with differing capabilities, ideally via a single high-level query interface. We present the design of a query optimizer for Garlic [C+ 951, a middleware system designed to integrate data from a broad range of data sources with very different query capabilities. Garlic's optimizer extends the rule-based approach of [Loh88] to work in a heterogeneous environment, by defining generic rules for the middleware and using wrapper-provided rules to encapsulate the capabilities of each data source. This approach offers great advantages in terms of plan quality, extensibility to new sources, incremental implementation of rules for new sources, and the ability to express the capabilities of a diverse set of sources. We de scribe the design and implementation of this optimizer, and illustrate its actions through an example.