The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Relational database management systems (RDBMSs) store information in tables, where each piece of data is stored at a particular row and column. Information in a given row generally is associated with a particular object, and information in a given column generally relates to a particular category of information. For example, each row of a table may correspond to a particular employee, and the various columns of the table may correspond to employee names, employee social security numbers, and employee salaries.
A user retrieves information from and makes updates to a database by interacting with a database application. The user's actions are converted into a query by the database application. The database application submits the query to a database server. The database server responds to the query by accessing the tables specified in the query to determine which information stored in the tables satisfies the query. The information that satisfies the query is retrieved by the database server and transmitted to the database application. Alternatively, a user may request information directly from the database server by constructing and submitting a query directly to the database server using a command line or graphical interface.
Queries submitted to the database server must conform to the syntactical rules of a particular query language. One popular query language, known as the Structured Query Language (SQL), provides users a variety of ways to specify information to be retrieved. Another query language based on the Extensible Markup Language (XML) is XML Query Language (XQuery). XML Query language may have multiple syntactic representations. For instance, one of them is a human-readable version and another is an XML representation (XQueryX). XQuery is described in “XQuery 1.0: An XML Query Language.” W3C Working Draft Jul. 23, 2004 at www.w3.org/TR/xquery. XQueryX is described in “XML Syntax for XQuery 1.0 (XQueryX).” W3C Working Draft 19 Dec. 2003 at www.w3.org/TR/xqueryx. Another related technology, XPath, is described in “XML Path Language (XPath) 2.0.” W3C Working Draft 12 Nov. 2003 at www.w3.org/TR/xpath20. XQuery and XQueryX may use XPath for path traversal.
To implement XQuery support in RDBMSs, one approach, referred as coprocessor approach, is to embed a general purpose XQuery processor inside an RDBMS engine and have the XQuery processor execute XQuery on behalf of the RDBMS SQL processor. The coprocessor approach has the SQL processor treat the XQuery coprocessor as a black box. During the execution of the SQL statement, the SQL processor handles the XQuery portion of the query by passing the text of the XQuery portion of the query, and the necessary XML values, as input to the XQuery processor. The XQuery processor then returns the results of processing the XQuery portion of the query to the SQL processor and the SQL processor performs any other appropriate operations specified by the query.
The coprocessor approach has numerous problems. First, the XQuery processor is not aware of any of the underlying techniques for storing XML data. Therefore, the XQuery processor needs fully materialized XML as input. Consequently, the XML input needed by the XQuery processor must be constructed or materialized by the RDBMS. Often the XML input needed for the XQuery is stored in the database and may be “shredded” into one or component XML elements, and those XML elements may be stored in one or more relational or object relational tables. Under these conditions, the process of materializing the XML data is time and resource consuming, and therefore makes the coprocessor approach inefficient.
A second problem with the coprocessor approach is that the XQuery portion of an incoming query cannot be optimized with the SQL portion of the incoming query (and vice-versa). Specifically, the XQuery processor is not able to optimize the SQL portion of the query; and the SQL processor is not able to optimize the XQuery portion of the query. Therefore, the SQL and XQuery parts of the query are separately optimized (if at all), which is suboptimal. In addition, the underlying storage of the data needed in the XQuery portion of the query will be stored in a form other than XML (such as being shredded into multiple XMLType columns). Since the XQuery processor is not aware of the form in which the underlying data is stored, the XQuery processor is not able to optimize execution of the XQuery operations based on storage information.
A third problem with the coprocessor approach occurs when an XQuery processor is invoked multiple times, where the output of a first XQuery becomes the input to a second XQuery in the original query. For example, in the case where the output of a first XQuery must be passed as input to a second XQuery, the output of the first XQuery must be generated as XML. This dictates that the XQuery processor, after determining the result of the first XQuery, must materialize the result as XML in an XML document and send the XML document to the SQL processor. The SQL processor then passes the XML document back to the XQuery processor along with the second XQuery. The XQuery processor will then retrieve and process the second XQuery with the XML document. This constitutes numerous wasted communication and computational steps and wasted bandwidth.
Therefore, there is clearly a need for techniques that overcome the shortfalls of the co-processor approach described above.