1. Technical Field
The present invention relates generally to processing mark-up language data, and more specifically relates to a single pass system and method for querying streams of XML data.
2. Related Art
As XML (extensible mark-up language) continues to gain popularity as a format for storing, sharing, and manipulating data, new tools and systems are being introduced to increase its flexibility. One important feature necessary to provide robust XML data processing applications involves the ability to query XML data. More specifically, with the growing popularity of streamed applications over networks such as the Internet, facilities for efficiently querying streams of XML data will become more and more critical.
Relational databases currently have efficient relational operators that can be re-applied for querying XML streams. Nevertheless, they lack support for XPath expressions, which are used to navigate through XML documents in most of the XML query mechanisms, such as XQuery and SQL/XML. While there are several implementations of XPath/XSLT that can be adapted for path processing in a relational engine, they are inadequate for the task of efficiently querying streamed XML data.
One obstacle in using the current XPath/XSLT technology in conjunction with a database engine is the mismatch between the tuple oriented model of the database engines and the node set model of the XPath processors. Retrieving multiple values from an XML document corresponds to retrieving multiple columns from a relational table and is very often needed. Achieving this goal for XML streams using the available XPath processors requires either materialization of the whole input stream, or significant changes to the query execution pipeline and optimizer to split one incoming stream into several streams of single valued results.
Another issue with the state-of-the-art XPath processors is that they are designed to operate over an in-memory Document Object Model (DOM) or similar representation of the input XML document. This approach does not work well when the documents are streamed, that is, when only certain fragments of the documents are available at query time. In this respect, current implementations assume that the XML documents are entirely available at query time. This processing model requires memory in the range of the input document size. In contrast, database engines are engineered to serve large numbers of concurrent users using limited main memory. A memory intensive XPath processor within a database engine can severely limit the number of users the system can support. Accordingly, a need exists for an efficient system and method of querying streams of XML data.