The implication problem for 'closest node' functional dependencies in complete XML documents
Abstract
With the growing use of XML as a format for the permanent storage of data, the study of functional dependencies in XML (XFDs) is of fundamental importance in a number of areas such as understanding how to effectively design XML databases without redundancy or update problems, and data integration. In this article we investigate a particular type of XFD, called a weak 'closest node' XFD, that has been shown to extend the classical notion of a functional dependency in relational databases. More specifically, we investigate the implication problem for weak 'closest node' XFDs in the context of XML documents with no missing information. The implication problem is the most important one in dependency theory, and is the problem of determining if a set of dependencies logically implies another dependency. Our first, and main, contribution is to provide an axiom system for XFD implication. We prove that our axiom system is both sound and complete, and we then use this result to develop a sound and complete quadratic time closure algorithm for XFD implication. Our second contribution is to investigate the implication problem for XFDs in the presence of a Document Type Definition (DTD). We show that for a class of DTDs called structured DTDs, the implication problem for a set of XFDs and a structured DTD can be converted to the implication problem for a set of XFDs alone, and so is axiomatizable and efficiently solvable by the first contribution. We do this by augmenting the original set of XFDs with additional XFDs generated from the structure of the DTD. © 2012 Elsevier Inc. All rights reserved.