1. Field of the Invention
The present invention relates to the art of information processing. It finds particular application in processing queries applied to extensible markup language (XML) documents and to data streams of XML items, and will be described with particular reference thereto. However, the present invention is useful in reducing temporary storage usage, increasing pipelining, and improving rewrite efficiency in queries applied to tag-based markup languages generally.
2. Description of Related Art
The extensible markup language (XML) is a flexible tag-based markup language suitably used to store data for posting on the Internet or local intranets, wide area networks, or the like. XML is increasingly being used as a native language for data storage in database management systems. In these and other XML applications, it is advantageous to have a flexible XML query language for creating and modifying XML documents, for efficiently selectively retrieving data from XML documents or collections of XML documents, for sorting data, for inputting data into XML documents, and for otherwise manipulating XML items and data. Various XML query languages have been developed, including XML-QL, XQL, Quilt, and XQuery.
XML employs markup tags to classify, group, interrelate, or otherwise provide metadata regarding data stored in XML documents. An XML query can be viewed as producing streams of sequences of items. In a tabular notation using one column, each separately processed sequence comprising an XML item or a concatentation of XML items is suitably viewed as a row, while each XML item within a row is suitably viewed as an XML item, fragment, or row element. Such terminology is used herein to provide a convenient tabular visualization of the data; however, the use of such terminology does not imply that the data is stored or processed in tabular format.
In typical query processing, an XML query is constructed by a user, a software module, or the like, and is converted from a textual format to a data flow model. At the data flow model level, query rewrites of identified inefficient data flow structures are performed to optimize the data flow model. A query rewrite is a query transformation that produces a more efficient query without changing the query output. The optimized data flow model is compiled into executable instructions. Optionally, query rewriting is performed at a lower level than the data flow model, such as at the executable instructions level. It will be appreciated that the executable instructions substantially correspond to a data flow model written in a specific executable instruction set. Similarly, the query text substantially corresponds to a data flow model written in a high level text-based language, and so optionally query rewrites are performed at the text query level. Typically, however, the XML query is converted into an intermediate data flow model which formats the XML query in a manner well-suited for efficient XML query rewrite processing.
Regardless of the processing level at which query rewrites are performed, the query rewrites perform various optimization tasks such as reducing XML item sequence construction, reducing usage of memory for temporary data storage, and promoting data flow pipelining. An XML item sequence is suitably visualized as a row or tuple that contains a plurality of XML fragments or row elements. An XML item sequence involves concatenation of XML items. Sequence construction breaks the data processing pipeline because the sequence is constructed before the data is further processed. Thus, one goal of query rewriting is reduction of sequence construction.
However, sequence construction may be important in the context of a specific XML query. For example, an XML items sequence associated with an XML element constructor requires the construction of the sequence. Thus, although a sequence is generally preferably transformed into a pipelined processing data flow of single XML items using one or more query rewrite operations, heretofore query rewrite processors have not generally implemented such rewrites. The query rewrite processor has difficulty identifying under what conditions such a rewrite can be performed without corrupting the XML query, and so does not perform the rewrite.
The present invention contemplates an improved method and apparatus which overcomes these difficulties and others.