1. Field of the Invention
This invention relates to apparatus and methods for querying collections of XML documents, and more particularly to apparatus and methods for optimizing the evaluation of descendant paths in collections of XML documents.
2. Description of the Related Art
XQuery is a query language used to extract and manipulate data from XML documents or other data sources that can be represented as XML. The XQuery specification uses XPath expressions to address specific portions of an XML document. In general, these expressions may be written as a series of steps to travel from a current “context node” to other nodes in an XML document. A query evaluator may process the XPath expression by navigating the XML document tree and returning nodes specified in the expression.
The XQuery/XPath query language enables a user to include a descendant axis step in a query expression to return all descendant nodes of a context node. For example, the XPath query ‘/a//b’ may be used to return all ‘b’ descendants of an ‘a’ node. The expression ‘//b’ within the query is the step that uses the descendant axis.
Although the syntax of the ‘/a//b’ query expression is simple, the evaluation of the query is processing intensive. In order to process this query, a query evaluator typically traverses down every child at every level of an XML document being queried, starting from an ‘/a’ match, to find any descendant nodes of the ‘/a’ match that match the ‘//b’ step. If ‘b’ descendants only appear in a few places within an XML document (i. e., the paths leading to a ‘b’ node are very selective), then the query evaluator may consume many cycles traversing down sub-trees that may never lead to a ‘//b’ match. This is true whether an XML document is stored in an in-memory representation or an XML native storage system. For this reason, it is generally advised to avoid using the descendant axis altogether and instead try to specify a node path as specifically as possible.
In view of the foregoing, what is needed is an apparatus and method for increasing the efficiency of descendant path evaluation in XPath/XQuery. Ideally, such an apparatus and method would enable a query evaluator to skip over document tree paths that lack descendants specified in an XPath expression.