The extensible markup language (XML) is a flexible tag-based markup language suitably used to store data for posting on the Internet or local intranets, wide area networks, or the like. XML is increasingly being used as a native language for data storage in database management systems. In these and other XML applications, it is advantageous to have a flexible XML query language for creating and modifying XML documents, for efficiently and selectively retrieving data from XML documents or collections of XML documents, for sorting data, for inputting data into XML documents, and for otherwise manipulating XML items and data. Various XML query languages have been developed, including XML-QL, XQL, Quilt, and XQuery.
XML employs markup tags to classify, group, interrelate, or otherwise provide metadata regarding data stored in XML documents. An XML query can be viewed as producing streams of sequences of items. In a tabular notation using one column, each separately processed sequence comprising an XML item or a concatenation of XML items is suitably viewed as a row, while each XML item within a row is suitably viewed as an XML item, fragment, or row element. Such terminology is used herein to provide a convenient tabular visualization of the data; however, the use of such terminology does not imply that the data is stored or processed in tabular format.
In typical query processing, an XML query is constructed by a user, a software module, or the like, and is converted from a textual format to a data flow model. At the data flow model level, query rewrites of identified inefficient data flow structures are performed to optimize the data flow model. A query rewrite is a query transformation that produces a more efficient query without changing the query output. The optimized data flow model is compiled into executable instructions. Optionally, query rewriting is performed at a lower level than the data flow model, such as at the executable instructions level. It will be appreciated that the executable instructions substantially correspond to a data flow model written in a specific executable instruction set. Similarly, the query text substantially corresponds to a data flow model written in a high level text-based language, and so optionally query rewrites are performed at the text query level. Typically, however, the XML query is converted into an intermediate data flow model which formats the XML query in a manner well-suited for efficient XML query rewrite processing.
Regardless of the processing level at which query rewrites are performed, the query rewrites perform various optimization tasks such as reducing XML item sequence construction, reducing usage of memory for temporary data storage, promoting data flow pipelining, improving index usage and improving I/O behavior. For XQuery, heuristics may be applied, such as “try to express the whole query with as few FLOWR expressions as possible” or “apply filters and extractions early during data processing” in the form of rewrite rules (or rewrites). However, there are problems which arise when queries are rewritten in order to evaluate XPath expressions as early as possible in order to filter out unwanted items in a more efficient way.
Some of these problems are associated with a query rewrite technique known as extraction push down of Xpath in which an XPath extraction is moved/pushed into a lower (closer to base table) query block without changing the semantics of the query, but possibly changing the result of the lower query block. One example is known as the multiple consumer problem, which occurs when an extraction pushdown cannot be performed since other parts of the same query refer to the result of the lower query block that it is being pushed into.
Accordingly, there is a need for systems and methods for increasing the efficiency of the processing of XML queries. There is also a need for systems and methods for rewriting queries to filter out unwanted items more efficiently while avoiding issues such as the multiple-consumer problem.