A common task in many publish-subscribe systems is the evaluation of XPath filters on XML documents. In other words, these tasks identify XPath queries that return true if the contents of the documents satisfy certain conditions that the subscribers (users) specify. In practice, many large XML documents are often vertically or horizontally fragmented (or both), and the fragments are often distributed and stored at different sites.
A number of techniques have been proposed or suggested for evaluating algorithms for XPath filters. Such techniques, however, typically only work on XML documents stored in a single site (i.e., when the documents are neither fragmented or distributed). When applied to fragmented and distributed documents, these algorithms have to visit each site in the network an unbounded number of times, ship data from one site to another, leading to heavy network traffic, and access fragments stored in different sites in a sequential manner rather than in parallel.
Partial evaluation or “program specialization” has been studied in the context of programming languages as a general optimization technique. Intuitively, given a function ƒ(s,d) and part of its input, s, partial evaluation specializes ƒ(s,d) with respect to the known input s. In other words, partial evaluation performs the part of ƒ's computation that depends only on s, and generates a partial answer, refereed to as a residual function ƒ′ that depends on the as yet unavailable input d.
Partial evaluation has been found to be useful in a variety of areas, including compiler generation, code optimization and dataflow evaluation. See, for example, Neil. D. Jones, “An Introduction to Partial Evaluation,” ACM Computing Surveys, 28(3), 1996. See also, P. Buneman et al, “Using Partial Evaluation in Distributed Query Evaluation,” Proc. of the 32nd Int'l Conf on Very Large Data Bases (2006), incorporated by reference herein Dataflow evaluation bears sufficient connections with distributed query evaluation and is worth investigating its use in parallel query processing.
A need exists for methods and apparatus for evaluating XPath filters on fragmented and distributed XML documents