The World Wide Web (WWW) involves a network of servers on the Internet, each of which is associated with one or more Hypertext Markup Language (HTML) pages. The HTML pages are transferred between clients that make requests of servers and the servers using the Hypertext Transfer Protocol (HTTP). Resources available from servers on the Internet are located using a Universal Resource Locator (URL). The standards and protocols of the WWW are promulgated by the World Wide Web Consortium (W3C) through its servers at www.w3c.org, and are used on many private networks in addition to their use on the Internet.
The HTML standard is one application of a more general markup language standard called the Standard Generalized Markup Language (SGML). Recently, a subset of SGML that is more powerful and flexible than HTML has been defined and has gained popularity for transferring information over the Internet and other networks. The new standard, developed and promoted by W3C, is called the eXtensible Markup Language (XML). XML provides a common syntax for expressing structure in data. Structured data refers to data that is tagged for its content, meaning, or use. XML provides an expansion of the tagging that is done in HTML, which focuses on format or presentation. XML tags identify XML elements and attributes of XML elements. XML elements can be nested to form hierarchies of elements.
A set of syntax rules for XML elements shared by multiple XML documents is defined by an XML schema, itself an XML document. For example, the syntax rules indicate what elements can be used in a document, in what order they should appear, which elements can appear inside other elements, which elements have attributes, what those attributes are, and any restrictions on the type of data or number of occurrences of an element. XML allows documents to contain elements from several distinct XML schema by the use of namespaces. In particular, elements from other, independently created XML schema can be interleaved in one XML document.
Given the elements defined and used by XML, a document object model (DOM) is a tree structure formed to define how the information in an XML document is arranged. The DOM is navigated using an XPath expression that indicates a particular node of content in the hierarchy of elements and attributes in an XML document. XPath is a standard promulgated by W3C.
Relational databases predate, and developed independently of, the World Wide Web. Relational databases store data in various types of data containers that correspond to logical relationships within the data. As a consequence, relational databases support powerful search and update capabilities. Relational databases typically store data in tables of rows and columns where the values in all the columns of one row are related. For example, the values in one row of an employee table describe attributes of the same employee, such as her name, social security number, address, salary, telephone number and other information. Each attribute is stored in a different column. Some attributes, called collections, can have multiple entries. For example, the employee may be allowed to have multiple telephone numbers. Special structures are defined in some relational databases to store collections.
A relational database management system (DBMS) is a system that stores and retrieves data in a relational database. The relational DBMS processes requests to perform database functions such as creating and deleting tables, adding and deleting data in tables, and retrieving data from the tables in the database. A well-known standard language for expressing the database requests is the Structured Query Language (SQL).
Object-relational databases extend the power of relational databases. Object-relational databases allow the value in a column to be an object, which may include multiple attributes. For example, the value in the address column may be an address object that itself has multiple attributes, such as a street address, a city, a state, a country, and a zip code or equivalent. An object type defines the attributes of an object in an object-relational database. SQL has been extended to allow the definition and use of objects and object types in object-relational databases. As used hereinafter, the term “object-relational database” refers to a subset of relational databases that support object-relational constructs; and an object-relational construct is one example of a relational construct.
Because of the popularity of XML as a data exchange format that supports hierarchical relationships among elements, and because of the power of relational DBMSs to update and retrieve data, there is a demand for generating XML data output from relational databases and storing XML data into relational databases. To support this demand, an industry standard SQL to operate on XML documents has been developed. This standard is called SQL/XML and documents relating to SQL/XML are available at the time of this writing at www.sqlx.org. As used in the following, XML constructs include XML elements, XML attributes, XML documents (which always have a single root element), and document fragments that include multiple elements at the root level; object relational constructs include such constructs as tables, rows, columns, objects and views.
Recent work, described in Murthy-1., has been directed to mechanisms for mapping XML constructs in an XML schema, to constructs in an object-relational DBMS Murthy-1. also describe mechanisms for storing XML content into the appropriated object-relational construct. One or more object-relational constructs that correspond to all or part of an XML construct are associated with an XML type. These mechanisms support various options for storing the XML type, from storing the entire document as a single featureless large object in the database (e.g., a character large object, CLOB, of the ORACLE™ DBMS available from Oracle International Corporation, of Redwood Shores, Calif.) to storing one or more XML elements as one or more simple or object type columns or collections in one or more tables.
An SQL/XML query may include XPath based operations such as EXTRACT, EXISTNODE, and EXTRACTVALUE, which operate on a portion of an XML construct indicated by an XPath expression provided as an argument to the operator. EXISTNODE returns one value (e.g., 0) if there is no XML element at the position in the hierarchy indicated by the XPath expression, and a different value (e.g., 1) otherwise. EXTRACT returns a data stream representing a portion of the XML construct at nodes in the hierarchy of the XML construct that include and descend from the XML element or elements indicated by the XPath expression. EXTRACTVALUE returns a scalar value, if any, from the XML element indicated by the XPath expression.
When a query with an XPath based operation is directed to an XML type, the SQL/XML DBMS may convert the stored data to its XML form and then send the XML form of the data to the process that implements the XPath operation. The XPath operation process parses the data to identify and return the indicated information. This process can be wasteful if only a portion of the XML construct, stored separately in one or more columns of a relational or object-relational database, affects the results. It would be desirable to extract only data from the columns of interest with an SQL query. In addition, the use of an SQL query enables further SQL optimizations that fully exploit the object-relational storage. Such optimization may not be available during parsing by an XPath operation.
Based on the foregoing, there is a clear need for a mechanism to rewrite a query with an XPath operation, directed to an XML type object-relational construct, as an SQL query
Many database owners have large amounts of data already stored in legacy databases that did not define object-relational constructs that correspond closely to XML constructs. For such users to operate on XML data, the object-relational constructs have to be related to XML constructs, before XPath based queries can be evaluated.
One approach to making legacy relational and object-relational databases available for XPath based queries is to migrate the entire database to a different database that supports XML. This approach involves converting the relational or object-relational data into XML format, such as by writing and executing a procedure to do the conversion, and storing the resulting XML constructs in a native XML database, such as the ORACLE™ XMLDatabase. However, such data migration may be expensive; and costs may exceed a database owner's willingness or ability to pay for the migration.
Another approach is to retrieve data from the database using relational or object-relational queries from a mid-tier application separate from the database server; and modify the mid-tier application to form and manipulate XML constructs. This approach can also be expensive, because the application must be modified. Also, this approach may be inefficient in that more data is often exported to the mid tier than is used in the results after the XML constructs are manipulated. This wastes communication bandwidth, wastes the processing resources at the database server to retrieve and send the excess data, and wastes the processing resources at the application to convert the excess data to XML constructs.
A SQL/XML DBMS, for example ORACLE™ XML Database, may allow a view of XML type to be defined based on one or more underlying tables. A view is a relational database construct for a stored query that generates a set of rows with columns from one or more tables, when a query directed to the view is executed. The query stored in the view becomes a sub-query of the query directed to the view. An XML type view is a stored query that generates a data stream of XML elements of the associated type from columns in one or more tables, when a query directed to the view is executed. XML type views can be used to generate XML data from legacy data stored in relational and object-relational constructs.
A SQL/XML DBMS, may allow some queries to include an XPath portion and a second portion, called a “sub-query” that produces an XML type. The XML type sub-query might not be stored as a view, but also generates a data stream of XML elements. As used hereinafter an XML type sub-query refers to a sub-query that generates a stream of XML elements, whether or not the sub-query is stored as a XML type view.
It would be desirable if, in some circumstances, the XML query directed to the XML type sub-query would be rewritten as one or more SQL queries on the underlying object-relational constructs. Rewriting XML queries as SQL queries avoids composing the data into an XML data stream and then parsing the XML stream during an XPath operation. Avoiding such composing and parsing saves computational resources and enhances performance. In addition, rewriting XML queries as SQL queries also allows SQL optimizations of the rewritten SQL query to fully exploit the object-relational storage, such as the use of indexes.
Based on the foregoing, there is a clear need for a mechanism to rewrite a query with an XPath operator directed to an XML type sub-query as an SQL query.
The past approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not to be considered prior art to the claims in this application merely due to the presence of these approaches in this background section.