As is known in the art, the eXtensible Markup Language (XML) employs a tree-structured model for representing data. Queries in XML query languages typically specify patterns of selection predicates on multiple elements that have some specified tree structured relationships. For example, the XQuery path expression:book[title=‘XML’]//author[.=‘jane’]matches author elements that (i) have as content the string value “jane”, and (ii) are descendants of book elements that have a child title element whose content is the string value “XML”.
This XQuery path expression can be represented as a node-labeled tree pattern with elements and string values as node labels. Such a complex query tree pattern can be decomposed into a set of basic parent-child and ancestor-descendant relationships between pairs of nodes. For example, the basic structural relationships corresponding to the above query are the ancestor-descendant relationship (book, author) and the parent-child relationships (book, title), (title, XML) and (author, jane). The query pattern can then be matched by (i) matching each of the binary structural relationships against the XML database, and (ii) “stitching” together these basic matches. Finding all occurrences of these basic structural relationships in an XML database is a core operation in XML query processing, both in relational implementations of XML databases, and in native XML databases.
There have been various attempts determine how to find occurrences of such structural relationships (as well as the query tree patterns in which they are embedded) using relational database systems, as well as using native XML query engines. These works typically use some combination of indexes on elements and string values, tree traversal algorithms, and join algorithms on the edge relationships between nodes in the XML data tree.
One known attempt is described in C. Zhang, J. Naughton, D. Dewitt, Q. Luo, and G. Lohman, “On supporting containment queries in relational database management systems,” Proceedings of SIGMOD, 2001, hereinafter “Zhang”), which is incorporated herein by reference. Zhang proposes a variation of the traditional merge join algorithm, called the multi-predicate merge join (MPMGJN) algorithm, for finding all occurrences of the basic structural relationships (referred to as containment queries). Zhang compared the implementation of containment queries using native support in two commercial database systems, and a special purpose inverted list engine based on the MPMGJN algorithm. The results in Zhang showed that the MPMGJN algorithm could outperform standard Relational Database Management System (RDBMS) join algorithms by more than an order of magnitude on containment queries. The key to the efficiency of the MPMGJN algorithm is the “(DocId, StartPos:EndPos, LevelNum) representation of positions of XML elements, and the “(DocId, StartPos, LevelNum)” representation of positions of string values, that succinctly capture the structural relationships between elements (and string values) in the XML database. Checking that structural relationships in the XML tree, like ancestor-descendant and parent-child (corresponding to containment and direct containment relationships, respectively, in the XML document representation), are present between elements amounts to checking that certain inequality conditions hold between the components of the positions of these elements.
While the MPMGJN algorithm outperforms standard RDBMS join algorithms, a significant amount of unnecessary computation and I/O operations are performed for matching basic structural relationships, especially in the case of parent-child relationships (or, direct containment queries).
It would, therefore, be desirable to overcome the aforesaid and other disadvantages.