Querying and searching information contained in XML documents that are stored within an object-relational database can be especially inefficient given certain queries. XML-aware indices, such as described in Chandrasekar, are available for providing quicker access to XML data in response to queries. Apart from XPath, XQuery is another XML query language that was developed for querying XML documents. The SQL/XML extension of SQL allows queries using XPath expressions to be evaluated on XML documents stored natively in a relational database system.
An XML index may be composed of a PATH table and a set of secondary indices on the PATH table. The PATH table contains one row per indexed node of an XML document. Each column of the table contains information associated with the indexed nodes, like the paths of the nodes or the value of the nodes; secondary indices can be built on the columns. An example of a secondary index is a b-tree index on the value column of the PATH table, also referred to as a value index. The XML index may be accessed when a user submits a query referencing one or more XML documents. The query can be decomposed and re-written with expressions that use the PATH table in the manner described in Chandrasekar.
An optimization engine may evaluate an expression using a secondary index in lieu of evaluating directly from the PATH table. A query that includes a value-based search is an example of a type of query that can be optimized by use of a secondary index. To search for a particular value within the XML document, a user may perform a linear search down the value column of the PATH table, performing as many comparisons as there are rows in the PATH table. Executing a search in this manner requires that each row is read from disk, a costly operation that should be minimized. Building a secondary index, like a b-tree index, on the value column would allow for index-based searching, thereby logarithmically reducing disk accesses for each search.
XML indices are especially valuable for accelerating value-based XQuery lookups because determining the string value of a node in XQuery is an expensive operation. Since a value of a node in XQuery is defined as the concatenation of all descendant text nodes of the node, an entire section of the tree hierarchy below the target node in an XML document would need to be accessed and read from disk to determine the string value of a high level node in XQuery.
While the benefit of using an XML index with a value-based query is clear, prior versions of the XML index are not optimized for value-based queries that use the semantics of XQuery. An XML index was previously defined only to store values of simple nodes (i.e., leaf nodes with no children nodes) in the value column of the PATH table, in accordance with the semantics of XPath. The value for complex nodes (i.e., nodes with one or more child nodes) in a PATH table is set to NULL. This is incompatible with the semantics of XQuery, which defines a value of a complex node as the concatenation of all descendant text nodes of the node.
Based on the foregoing, it would be desirable to extend the PATH table infrastructure, especially the value column, to efficiently accommodate queries using XQuery.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.