The Extensible Markup Language (XML) is a standard for data and documents that is finding wide acceptance in the computer industry. XML describes and provides structure to a body of data, such as a file or data packet. The XML standard provides for tags that delimit the sections of an XML entity referred to as XML elements. Each XML element may contain one or more name-value pairs referred to as attributes. The following XML Segment A is provided to illustrate XML.
SEGMENT A<book><publication publisher=“Doubleday”date=“January”></publication><Author>Mark Berry</Author></book>
XML elements are delimited by a start tag and a corresponding end tag. For example, segment A contains the start tag <Author> and the end tag </Author> to delimit an element. The data between the elements is referred to as the element's content. In the case of this element, the content of the element is the text value Mark Berry.
An element is herein referred to by its start tag. For example, the element delimited by the start and end tags <publication> and </publication> is referred to as element <publication>.
Element content may contain various other types of data, which include attributes and other elements. The book element is an example of an element that contains one or more elements. Specifically, book contains two elements: publication and author. An element that is contained by another element is referred to as a descendant of that element. Thus, elements <publication> and <author> are descendants of element <book>. An element's attributes are also referred to as being contained by the element.
By defining an element that contains attributes and descendant elements, the XML entity defines a hierarchical tree relationship between the element, its descendant elements, and its attributes. A set of elements that have such a hierarchical tree relationship is referred to herein as an XML document.
An important feature of XML is that it may be used to define XML documents that conform to industry standards. One such standard is the Document Object Model (DOM), promulgated by the W3C.
The SQL/XML standard defines an XML data type (INCITS/ISO/IEC 9075-14:2003, which is incorporated herein by reference) in an SQL system. An object-relational database system may support XMLType as a native built-in data type representing XML values just as any other native data type, such as VARCHAR, the name of an SQL data type representing variable length character values. XML value refers to any value represented by the XQuery Data Model. The XQuery Data Model is described in XQuery 1.0 and Xpath2.0 Data Model, W3C Working Draft, 29 Oct. 2004, which is incorporated herein by reference. An XML value is referred to herein as an XMLType instance. Object-relational database systems use XMLType to represent XMLType instances used or generated in very diversified situations. For example, XMLType instances can be XML documents natively stored in XMLType tables or XMLType columns of tables. The XMLType instances can be generated from relational tables and views using SQL/XML publishing functions, such as XMLElement( ) and XMLAgg( ). The XMLType instances can be generated from the result of an XQuery embedded in an XMLQuery( ) function or XMLTable construct. The XMLType instance can be generated from the result of an XPath embedded in extract( ) function. An XMLType instance can be the return type of a user defined or system defined function. An XMLType instance can be converted from an object type, collection type or an arbitrary user defined opaque type in an object-relational database system. Throughout this document, ‘XMLType’ is used as the datatype name used for representing XML values. Throughout this document, the term ‘SQL expression’ refers to an expression that can be used in an SQL query or SQL procedural languages that are used to write user defined functions and procedures. Examples of SQL expressions are table or view columns, arithmetic functions, logical functions, SQL case functions, SQL/XML publishing functions, XMLQuery( ) functions, extract( ) functions, PL/SQL variables, etc.
Information about the structure of specific types of XML documents may be specified in documents referred to as “XML schemas”. For example, the XML schema for a particular type of XML document may specify the names for the data items contained in that particular type of XML document, the hierarchical relationship between the data items contained in that type of XML document, and the type values contained in that particular type of XML document, etc. A standard for an XML schema is XML Schema, Part 0, Part 1, Part 2, W3C Recommendation, 2 May 2001, the contents of which are incorporated herein by reference.
XML Storage Mechanisms
Various types of storage mechanisms are used to store an XML document. One type of storage mechanism stores an XML document as a text file in a file system.
Another type of storage mechanism uses object-relational database systems that have been enhanced to store and process queries for XMLType instances. For example, an XML schema may be registered with an object-relational database system. During the registration process for a given XML schema, the database system determines (1) a database representation for the XML schema and (2) mapping information mapping the XML schema to components of the database representation. Determining the database representation for a given XML schema may involve, for example, determining the columns, database objects, collection types, constraints, and even the indexes that are to be used by the database system to store data for XML documents that conform to the given XML schema.
For example, a database representation of an entire XML document may be a CLOB (binary large object), or one or more tables whose columns store the components of an XML document in one or more rows. A database representation may be a hierarchy of objects in an object-relational database; each object is an instance of an object class and stores one or more elements of an XML document. The object class defines, for example, the structure corresponding to an element, and includes references or pointers to objects representing the immediate descendants of the element.
Data Typing Needed to Rewrite XML Query Languages into SQL Query
It is important for object-relational database systems that store XMLType instances to be able to efficiently execute queries using XML query languages, such as XQuery/XPath. XML Query Language (“XQuery”) and XML Path Language (“XPath”) are important standard query languages for XML, and can be used in conjunction with SQL to express a large variety of useful queries. XPath is described in XML Path Language (XPATH), version 1.0, W3C Recommendation 16 Nov. 1999, which is incorporated herein by reference. XPath 2.0 and XQuery 1.0 are described in XQuery 1.0 and XPath 2.0 Full-Text, W3C Working Draft 9 Oct. 2004, which is incorporated herein by reference.
Various approaches have been developed for an object-relational database system to execute XQuery/XPath queries. One approach for executing XQuery/XPath queries is referred to herein as the “rewrite” approach, or as query rewriting. XQuery/XPath queries received by an object-relational database system are dynamically rewritten to directly reference and access the underlying object-relational data. Specific techniques for implementing the rewrite approach are described in the above XQuery and XPath Translation and Rewrite patent Applications.
The process of rewriting XPATH and XQuery may depend on a procedure referred to herein as data typing. Data typing refers to the process of determining the type structure of XMLType instance from variety of XML data sources during query compilation time. There can be multiple type representations for the type structure of XMLType. The type structure generated by data typing process is used to type check an XQuery and XPath during compile time and determine how to correctly and optimally rewrite the XQuery and XPath.
For example, the following XQuery may be rewritten to the following SQL query.                XQ: for $I in/PurchaseOrder/LineItems where $i//@lineno>45 return $i        SQL: SELECT value(v)                    FROM table(xmlsequence(extract(poview,‘/PurchaseOrder/LineItems’)) v            WHERE extractValue(value(v), ‘LineItems/@lineno’)>45                        
Rewriting XQuery might include checking that the data type of lineno is numeric so that the comparison operation in XQ (i.e. $i/@lineno>45) can be rewritten into equivalent SQL numeric comparison operators. Determining how to rewrite query XQ might depend on determining whether lineno is a scalar or collection data type. If lineno is a collection type, then the WHERE clause in SQL requires an EXISTS subquery operation.
The reference poview in SQL is a name of a view defined by an object-relational database system. A view is an object-relational database construct for a stored query that generates a set of rows with columns from one or more tables, when a query directed to the view is executed. An XML type view is a stored query that generates a data stream of XML values from, for example, columns in one or more tables, when a query directed to the XML type view is executed. XML type views are described in greater detail in the Rewrite application.
Object-relational data structures, such as a table, columns and object types, and abstractions of database data, such as a view and its columns, have data types defined by a database management system's object-relational metadata, and are thus “known” to the object-relational database system. Object-relational metadata is metadata that describes database objects and data structures managed by the database management system and that can be referenced by database statements processed by the database system as data structures recognized by the database management system. Database objects and data structures include tables, object tables, columns, object types, and views. In many scenarios in which XMLType instances are processed by an object-relational database system, particularly those scenarios involving the rewrite of XQuery/XPath queries, the XMLType instances are not explicitly defined by the metadata of object-relational database systems. When the type structure of XMLType instances being processed by an object-relational database system are not known to the database system during query compile time, many optimizations for querying and modifying XMLType instances can not be achieved that could otherwise be achieved if the type structure of the XMLType instances were known during query compile time.
One possible approach to resolve this problem is to use one or more ad-hoc mechanisms for generating a type representation for the type structure of XMLType. For example, in the case of an XMLType instance conforming to an XML schema, the type representation of the XMLType instance can be represented by an XML schema so that rewrite of an XPath/XQuery query is feasible. For XMLType instances generated from SQL/XML functions, the type representation of the XMLType instances can be an SQL expression tree so that query rewrite is feasible. For XMLType instances generated from XQuery embedded in the XMLQuery( ) function, the type representation of an XMLType instance can be the result type of the underlying XQuery expression which generates the result. For an XMLType instance generated from object-relational data via SYS_XMLGEN( ) function (a function that returns an XMLType instance based on an object type defined by an objection-relational database system), the type representation of the XMLType instance can be the object type metadata maintained by the object-relational database system.
However, for such ad-hoc approaches, the use of diverse kinds of type representation, i.e. XML schema, SQL expression operator tree, XQuery expression tree and object type metadata, greatly complicates data typing, modification, and optimization of XQuery/XPath queries, because the data typing procedures have to handle multiple hybrid forms of type representations to describe XMLType instances.
Therefore, there is a need for a mechanism to represent the type structure of an XMLType instance from diverse XML data sources in a uniform way.