In conventional databases, an incoming query is typically received and manipulated by a database front end prior to being submitted to a query processor for optimization and execution. Generally, the database front end uses the incoming query to generate a query plan for executing the query at the query processor. The query plan is then used to generate an execution plan which is used to execute the incoming query.
Some relational databases generate an execution plan as follows. First, the query is parsed to yield an abstract syntax tree. The abstract syntax tree is then transformed into a unified tree structure in which nodes represent abstract operations to be performed on the query. An algebrizer is then employed to convert the unified tree operations into relational algebraic expressions in a logical operator (log-op) tree. The log-op tree represents the resulting query plan. Query optimization is used to optimize the performance of the query plan, which is then ready for execution.
A recent development with respect to databases is that, in addition to supporting traditional relational data, the databases also support extensible markup language (XML) data. For example, SQL Server™ from Microsoft Corp. of Redmond, Wash. enables data to be defined using an XML data type. Columns with such an XML data type can be created in a data table, and XML variables and parameters can be declared. Such XML data can be searched, retrieved, and updated. Specifically, a query may include an XML expression written in an XML based query language such as XQuery and XSLT.
While relational database systems can include functionality to store XML data, a limitation of conventional databases is in the retrieval of the XML data. The plans derived from XML-related queries tend to be complex even for simple tasks, such as retrieving several scalar values from a set of XML documents. For example, consider a set of XML documents containing information regarding a number of persons, with two name elements for each person, “firstname” and “lastname.” A plan derived from a query requesting that the firstname and lastname be retrieved for each person in the XML documents would require each XML document to be accessed twice, once to retrieve the firstname for each person, and once to retrieve the lastname. This redundant opening is costly if performed for many scalar values on many XML documents. The work to process an XML-query grows in proportion to the number of scalar values to be retrieved and in proportion to the number of XML documents to be accessed by the query.
Thus, there is a need in the art for systems and methods for more efficiently processing relational database queries which access XML data.