There are many database systems that allow storage and querying of eXtensible Markup Language data (“XML data”). Though there are many evolving standards for querying XML, all of them include some variation of XPath. XPath allows XML data to be queried based on path expressions. A path expression is any expression that specifies a path through the hierarchical structure of an XML document. The portion of an XML document identified by a path expression is the portion that resides, within the structure of the XML document, at the end of any path that matches the path expression.
A query that uses a path expression to identify one or more specific pieces XML data is referred to herein as a path-based query. The process of determining which XML data corresponds to the path designated in a path-based query is referred to as “evaluating” the path expression.
Unfortunately, even database systems that have built-in support for storing XML data are usually not optimized to handle path-based queries, and the query performance of the database systems leaves much to be desired. In specific cases where an XML schema definition may be available, the structure and data types used in XML instance documents may be known. However, in cases where an XML schema definition is not available, and the documents to be searched do not conform to any schema, there are no efficient techniques for querying using path-based queries.
Without XML indexes, path expressions were directly evaluated against the base tables. As a result, the processing of these expressions involved a complete scan of the base tables. Each scanned row was tested to ascertain whether it satisfied the path expression. Moreover, the evaluation of the path expression was typically done in a functional way manner by constructing a DOM (memory data structure) and traversing the DOM tree while evaluating the path.
Based on the foregoing, there is a clear need to improve the processing time of path-based queries by providing a way for path-based queries to retrieve data from XML documents without incurring the problems associated with a complete scan of the base tables and construction of expensive memory data structures.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.