XML data management has attracted great attention in the recent decade both in the academia and in the industry. Unlike in the relational data model, where the schema (e.g., number of columns, the data type of each column, and the semantic constraints on those columns) needs to be predefined, XML allows data to be organized in a schema-less fashion as long as the data remains in a hierarchical tree structure. Due to its flexibility to model such semi-structured data, XML has been used extensively in many business settings that require data exchanges between different organizations. However, also due to the flexibility of the XML data model, query processing on XML data is usually less efficient than query processing on data stored relationally.
XML indexes help speed up XML data access. One such XML index is described in U.S. patent application Ser. No. 10/884,311, which is referenced above. This XML index includes a relational table, referred to herein as a path table, to capture properties of indexed XML elements in one or more XML documents. Each property corresponds to a column in the path table. Secondary indexes, such as B-tree indexes, may be created on the columns of the path table.
When a query is submitted against indexed XML data, the query is, if necessary, first rewritten into SQL on the path table. The query rewrite is automatically performed by a relational database management system (RDBMS) in order to guarantee that the semantics of the query is equivalently expressed using SQL on the columns in the path table. In this way, existing relational techniques (e.g., a query optimizer selecting the “best” of multiple plans) are exploited to speed up XML query processing.
The schema of the path table is defined as follows:
Column NameData TypeRIDROWIDPATHIDRAW(8)ORDER_KEYRAW(1000)LOCATORRAW(2000)VALUEVARCHAR2(4000)
In one implementation, each row in the path table corresponds to one element in an XML document. Each column in the path table represents some property of the XML element, which element may be a node or attribute.
In this example, the PATHID column represents an identifier of the rooted path of an element, which column may be used to answer an XML path expression. The ORDER_KEY column represents a unique identifier of an element. The ORDER_KEY column may be used to answer structural relationships (e.g., parent-child, ancestor-descendant, precedence in document order) between elements.
For example, an XML index is created on XML auction documents (referred to collectively as XMARK_CSX). The following SQL/XML query is submitted against XMARK_CSX to identify the person's name whose ID is “person0”.
SELECT S.NAMEFROM XMARK_CSX T,   XMLTABLE(‘/site/people/person’ PASSING T.OBJECT_VALUE     COLUMNSNAME VARCHAR2(40)PATH ‘/person/name/text( )’,ID VARCHAR2(40) PATH ‘/person/@id’) SWHERE S.ID = ‘person0’;