Cyclic Constructs
U.S. patent application Ser. No. 10/428,878 describes techniques for rewriting XPath queries on an XML document that has been decomposed and stored in object-relational constructs, into SQL queries on the XML data stored in the object-relational constructs. However, cyclic (also referred to as recursive) constructs are allowed in XML schemas, which are XML documents that describe the structure of corresponding XML documents. That is, the W3C Recommendations “XML Schema Part 1: Structures” and “XML Schema Part 2: Datatypes” allow for use of XML documents that conform to schemas that contain cyclic constructs.
Generally, an XML document that contains a cyclic construct contains (a) an occurrence of an element type that has a child (“contains”) that is another occurrence of itself; or (b) an occurrence of an element type (the “first element” for this example) that has a child that is an occurrence of an element type from which the first element descends. In an XML document, an occurrence of a first element type is a “parent” of an occurrence of a second element type, which is a “child” of the occurrence of the first element type, if the occurrence of the second element type is indented relative to the occurrence of the first element type. That is, the occurrence of the second element type “descends” from the occurrence of the first element type, and the occurrence of the first element type “contains” the occurrence of the second element type. In practice, XML documents conforming to XML schemas that allow cyclic constructs are common.
The following is an example of a cyclic construct, depicted in a hierarchical form.
In this example, element x contains element a; element a contains element b; and element b contains element a and element b. Hence, such a document is cyclic because element b contains element a, from which it descends, and contains an occurrence of itself.
For another example of a cyclic construct, consider the following schema.
<schema targetNamespace= “myNS”xmlns=http://www.w3.org/2001/XMLSchemaxmlns:law= “myNS”><element name= “Chapter”><complexType><sequence><element ref= “Section”/></sequence></complexType><element name= “Section”><complexType><sequence><element name= “ID” type= “integer”/><element name= “Contents” type= “string”/><element ref= “Section” min_occurs=0/></sequence></complexType></schema>In this example, an XML document that conforms to this schema may include a “Chapter” element that contains a “Section” element, where a “Section” element may contain a “Section” element.Storing XML Data in a Relational Database
With one approach to storing XML data in a relational database, XML documents are decomposed (also referred to as “shredded”), with the elements and attributes (generally, “objects”) contained therein stored in object-relational tables. Often, such objects are stored using multiple tables, with references from table to table to complete paths through the XML document hierarchy. This may be done for any of a number of reasons, for example, due to repeating elements (also referred to as “collections”) within the document, limitations regarding the number of columns that a table may contain, due to the presence of cyclic constructs, or to share some of the objects among applications. A “main table” typically stores a portion of the objects associated with XML documents that conform to a given XML schema, including information about the root node of the documents, where each record in the main table corresponds to such an XML document. One or more “out-of-line” tables are often used to store some of the objects, possibly for one of the foregoing reasons. For example, contents of collections are often stored in an out-of-line table.
Further, execution of a SQL query, which is generated based on an XPath query, against tables containing XML data in object-relational form, significantly outperforms DOM-based (Document Object Model) execution of the XPath query. This is because DOM-based execution is a computationally expensive operation because the DOM tree-based representation of an XML document is in-memory. This in-memory representation is inefficient due to its memory usage and is detrimental to system performance due to the iterative nature of the query process in which, for each row in the tables being queried, a DOM is created in-memory and an operation is evaluated.
Previously, rewriting XPath queries on XML documents utilizing a cyclic construct, into SQL queries, was not possible. Because the structure is cyclic, when an XML document is decomposed into object-relational tables, objects from such documents are stored using references from table to table. Thus, all “Section” objects would be stored in a table (e.g., a “Section” table) as rows that are referenced by rows in one or more other tables (e.g., a “Chapter” table), because the “Chapter” XML element contains the “Section” XML element. However, there was no mechanism for knowing what “Chapter” a given “Section” descends from and, therefore, what rows should be joined in the SQL query. Further, within the “Section” table, one row may point to another row, to account for the cyclic nature of a “Section” XML element referencing another “Section” XML element.
Some XPath queries request every occurrence of an XML element type in one or more documents, regardless of where, in the XML hierarchy, each occurrence resides. The common operator used for such queries is referred to as “slash slash” (//), or “descendent-or-self”. Thus, with the foregoing schema example, an accurately-executed XPath query “/Chapter//Section/Contents” would return the contents of all occurrence of “Section”, whether such instances are referenced by a “Chapter” or by another “Section.” Further, XPath queries may contain predicates on a particular object node set, such as “/Chapter//Section[ID=5]/Contents” (where “ID=5” is the predicate), which requests the contents of all Sections having an ID=5, regardless of where in the XML hierarchy each occurrence resides.
As discussed, with cyclic XML documents, rewriting an XPath query to accurately query corresponding object-relational tables, was not possible. This is because, for example, it is not known which particular rows in the “Section” table correspond to which particular rows in the “Chapter” table. Knowledge of the corresponding rows would be necessary for joining the corresponding rows, in order to completely execute the query without entering an infinite loop. Further, with cyclic XML documents, accurately rewriting an XPath query that includes a “slash slash” operator is not possible for at least the same reason. For example, rewriting the XPath query “/Chapter-1//Section[ID=5]/Contents”, is not possible because it is not known at query compilation time how many levels deep the “Section” hierarchy may be, and which rows in the “Section” table would need to be joined with which row in the “Chapter” table.
Based on the foregoing, there is a need for an improved technique for storing in a relational database, and querying, XML documents that conform to schemas that contain cyclic constructs.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.