1. Field of the Invention
The present invention relates to index data structures useful in indexing data objects such as XML documents.
2. Background Art
With the growth of the internet, internet languages based on XML have flourished. XML documents structurally can be treated as connected ordered acyclic graphs that form a spanning tree. Such documents are not multigraphs and do not have self referencing edges. The set of vertices in XML structures are called nodes. XML is used to directly represent sets of relationships that match these criteria. Typically, such sets are hierarchical tree structures.
XPath is a cyclic graph navigational query language that allows for single or branching path structure access with predicate content filtering used on an XML tree directed by a set of 13 axes navigational primitives. XPath partitions an XML document into four primary axes and a context node, such that the axes are interpreted relative to each context node. The four primary XPath axes are: preceding, following, ancestor and descendent. The remaining secondary axes can be algebraically derived from these four primary axes. Relative to the context node, ‘h’, the primary axes sets are graphically depicted in FIG. 1. In FIG. 1, the primary axes are encapsulated in dotted lines and span the entire graph.
XPath queries are processed from left to right location step by location steps with “/” or ‘//’ as separators. Upon execution, XPath queries return one or more sets of nodes, called a sequence, for each location step using as input the set of nodes returned in the previous location step query in document order with duplicates eliminated. Location steps are composed of an axis, a node test and zero or more predicates: axis::node-test [predicate]*. Node tests match the vertex label, called a qualified name (or qname) in XML. For example, an XPath query may appear as such: //descendent-or-self::g[h/j]
Recently, there has been a large focus in the literature around the many problems and potential solutions for implementing XML within RDBMS systems. Many solutions have been proposed that transform the XML space to the Relational space, yet several open query problems remain with the mapping including the XML-to-SQL translation problem and query containment optimization. Alternative solutions are being sought that can avoid expensive SQL join operations, including efforts by commercial database vendor research departments. There has been much work around optimizing ancestor-descendent and parent-child linkages, but less focus has been placed on solving the antagonistic following and preceding XPath axes.
The primary prior art indexing method for relational technology is a B-Tree, designed to be optimal for height balance and O(lg(n)) singleton row level access. Hierarchical XML data structures and in general generic hierarchical mapping to relational is done using various techniques with recursive edge mapping providing the most universal solution, but also the lowest level of performance. Edge mapping requires chopping up the XML tree into small discrete pieces where the edges are indexed by a B-Tree index. The reason performance is so poor for XPath is that for each query each of the discrete pieces needs to be identified and retrieved and then reassembled into the proper subtrees to satisfy the query, a lengthy process.