Relational database management systems, or “database systems”, typically support a wide range of data types. For example, such a database system allows users to store and query scalar data type values such as integers, numbers, and strings. Some database systems also have the ability to support more complex data types. One particularly useful complex data type supported by some database systems is hierarchical Extensible Markup Language (“XML”) data. Those database systems that include XML support allow users to define tables, or columns in a table, having an XML type.
XML data does not naturally lend itself to physical storage models that are conventional in database systems. A variety of storage techniques have been developed to manage the storage of XML data. For example, models for storing XML type data in a database system include storing the data object-relationally and storing the data in aggregate form.
Storing XML type data object-relationally involves defining a complex database representation for the XML data. In such a representation, various database objects are defined to represent and the components of the XML data. For example, each element of an XML document may be represented by a column in a table, and data from a given XML document is stored in a row of a table. XML elements may include text nodes, attributes, other kinds of nodes, and other values included in an XML document.
The underlying structures that comprise a database representation (e.g., tables, columns, etc.) are referred to as base database objects and structures, or simply database objects. When an XML document is submitted to a database system for object-relational storage, the XML document is shredded into element values, which are stored in corresponding components of the base database structures. Thus, for example, to insert an XML-based document into an object relational table, a new row is created in the table for the document. The XML document is shredded into its various elements and each value from the shredded document is placed in the column of the table that corresponds to the value's element.
XML data that is stored object-relationally can be queried more efficiently through traditional query mechanisms. However, shredding XML data into component elements may be time consuming. Furthermore, if a particular set of XML data does not conform to a schema, or has a lot of variation among the component parts of the data, then storing the shredded XML data may require many database structures.
As an alternative, aggregate storage techniques may be used to store XML type data. In aggregate storage, unshredded XML data is stored in a large objects (LOBs), which include character-type large objects (CLOBs) and binary-type large objects (BLOBs). Aggregate storage is useful for storing complex data because such storage may be used to store data regardless of data format and/or the availability of a schema for the data. For instance, when adding an XML document to a LOB-based table, the document may be stored in a LOB as one large chunk of data, without performing any parsing or shredding of the data, and a reference to the location of the LOB for the XML document may be included in the table. Thus, tables using LOB-based storage to store complex data typically do not contain individual data values that have been extracted from the complex data.
Performing queries on data that is stored in aggregate form may be much more inefficient, time consuming, and resource intensive than performing queries on data stored using object-relational techniques. To simplify certain queries on XML data stored in aggregate form, a structured XML index may be defined to selectively store, in object-relational tables, extracted portions of XML data stored in LOBs. A structured XML index is an index that stores XML data in object-relational tables. The XML elements stored in a structured XML index may be tied to the LOB storing the source XML document through a location identifier for the XML table storing the LOB. Such a location identifier may be a physical row identifier, or a logical identifier of the location of the LOB in the base table, etc.
An example of a structured XML index is the XMLIndex developed by Oracle Corporation, described at http://download.oracle.com/docs/cd/B28359—01/appdev.111/b28369/xdb_indexing.htm, accessed Sep. 17, 2009, the contents of which are incorporated by reference as if fully set forth herein. While a structured XML index is described here in connection with XML data stored in aggregate form, a structured XML index may be used with many different kinds of data.
XML elements that are indexed through a structured XML index may be accessed at a much lower cost than accessing XML elements stored in the aggregate storage. Therefore, a query optimizer may rewrite a query on XML data that is stored in aggregate form to leverage those elements of the XML data that are stored in a structured XML index.
Path expressions, such as XPath expressions and XQuery expressions, may be used to identify particular elements of XML data. XPath is a method of identifying XML elements in a hierarchical XML structure. XPath operates on the abstract, logical structure of an XML document, rather than its surface syntax, to identify nodes in an XML document. XPath gets its name from its use of a path notation for navigating through the hierarchical structure of an XML document. XPath models an XML document as a tree of nodes. There are different types of nodes, including element nodes, attribute nodes and text nodes. The XPath data model is described in detail in Section 5 (“Data Model”) of “XML Path Language (XPath)” (version 1.0), a W3C (World Wide Web Consortium) Recommendation dated 16 Nov. 1999, which is incorporated by reference as if fully set forth herein.
XQuery is the W3C language designed for querying XML data. It is similar to SQL in many ways, but just as SQL is designed for querying structured, relational data, XQuery is designed especially for querying semi-structured, XML data from a variety of data sources. The XQuery language is described on the W3C website, visited Sep. 12, 2009, at http://www.w3.org/XML/Query, which is incorporated by reference as if fully set forth herein.
As described in the Matching Application referred to above, multiple techniques may be used for determining whether a structured XML index may be used when executing an XML query. One such technique includes generating one or more index definition path expressions by concatenating a row pattern expression and a column pattern expression of an structured XML index. An index definition path expression (referred to herein as an “index expression”) may be generated for each column pattern expression. For ease of illustration, an index expression is described as derived from the definition of a structured index. A path expression in an XML query (referred to herein as a “query expression”) is then compared to one or more of the index expressions. If the query expression matches an index expression, then the structured XML index may be used to process the XML query.
In another technique, a determination is made as to whether an expression associated with a structured XML index is semantically equivalent to an expression of an XML query even though the expressions are not the same. Such expressions may include variable expressions, value expressions, constructor expressions, and/or path expressions.
In another technique, a determination is made as to whether the row pattern expression of a structured XML index “contains” a query expression of an XML query. An example of containment is when a query expression includes a predicate that is not part of the row pattern expression.
Index expressions that “match” query expressions, as described above, are referred to herein as “directly matching” the query expressions. Traditionally, a query optimizer only utilizes a structured XML index in a particular query if a query expression from the query directly matches an index expression for the index. Therefore, many queries that involve XML elements that are indexed in a structured XML index, but are not referred to using path expression that directly match index expressions used to define the index, are not evaluated using the XML index. It would be beneficial to use the structured XML index in queries that do not include query expressions that directly match index expressions, but still refer to XML elements stored in the index.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.