1. Technical Field
Embodiments of the invention relate generally to the field of Extensible Markup Language (XML) messages and more particularly to evaluation of XPath predicate expressions in an online stream of XML messages.
2. Prior Art
Hypertext Markup Language (HTML) is a markup language designed for the creation of web pages with hypertext and other information to be displayed in a web browser. XML is a metalanguage describing structure of data and is not a fixed set of elements like HTML. Over a period of time, the use of XML as a data exchange format has increased tremendously.
XPath is an expression language used for addressing XML messages. XPath also provides basic facilities for manipulation of strings, numbers and Booleans. XPath operates on logical structure of XML messages. At any given point of time an XPath processor receives several streams of the XML messages. The XPath processor also receives several user profiles or preferences in the form of XPath queries. An XPath query includes one or more location steps, for example, //book[Price>50 and Publisher=“Prentice Hall”]. A location step includes an axis, for example, “//”, a node test, for example, “book” and zero or more XPath predicate expressions, for example, [Price>50 and Publisher=“Prentice Hall”]. An XPath predicate expression filters a node test with respect to an axis. Location paths taking part in the XPath predicate expressions include //book/price and //book/publisher. A location path describes how a specific part of the XML messages may be found. Location paths describe address of one node with respect to another. The XPath predicate expressions need to be evaluated on the XML messages to select the appropriate results for the XPath predicate expressions and the XPath queries. As the number of XPath predicate expressions and the streams of the XML messages are huge, an efficient technique for evaluating the XPath predicate expressions is needed.
Currently one technique available for evaluating an XPath predicate expression includes creating a document object model (DOM) for an XML message. The XPath predicate expression is then evaluated on the DOM. However, the XML message needs to be stored for creating DOM which leads to inefficient utilization of memory. Many times the XML message is huge which places strain on system resources. Further, output of the XPath predicate expression is also delayed as the XPath predicate expression is evaluated when the XML message is completely received and the DOM is created. Moreover, if the XML message arrives in part it increases the delay.
In order to address challenges posed by DOM approach, alternative approaches such as Simple API for XML (SAX) have evolved. SAX refers to presenting the XML message as a specialized stream of events. In other words, SAX is event driven and relies on a programmer to specify a particular event. Upon the happening of such specified event, XPath predicate expression processing happens.
An XML filtering system includes an XPath parser and a filtering engine. The XPath parser receives the XPath queries, parses them and sends the parsed results to the filtering engine which is then converted into an internal representation.
Various filtering approaches are known today, for example, X-filter and Y-filter algorithms. According to Y-filter algorithm, before handling the SAX events, the XML filtering system parses the XPath queries and generates a Nondeterministic Finite Automation (NFA). NFA refers to intermediate data structure generated by parsing the XPath queries. An arriving XML message is parsed and the SAX based events raised by the XML parser callback the handlers and drive the transitions in the NFA.
A technique for evaluating an XPath predicate expression using SAX based events is out of line predicate evaluation. The XPath predicate expression is modeled in the form of a logical ordered tree including a root node and several leaf nodes. The root node is a top node in a tree and leaf node is a terminal node. The XML message is received and parsed to raise SAX based events. The XPath predicate expression is then evaluated on the SAX events. The evaluation starts from root node of the tree of the XPath predicate expression. Thereafter, the evaluation proceeds to child nodes of the root node and then to subsequent child nodes of these child nodes. However, the evaluation starts only after parsing of the XML message is complete which causes delay in output. Further, data for XML nodes participating in the XPath predicate expression evaluation is stored which leads to inefficient utilization of memory.
U.S. patent application publication (20070250471A1), discloses a method for running XPath queries over XML streams with incremental predicate evaluation. The method includes evaluating one XPath query at a time. However, as the number of XPath queries that need to be evaluated are huge, evaluating XPath queries one by one leads to time inefficiency.
In light of the foregoing discussion, there is a need for an efficient method and system for evaluation of XPath predicate expressions.