There are a several well-known techniques for processing XML node sets and performing document-order iteration of their nodes.
A first technique is to store the nodes of the node set in an unordered set U. Then, an in-order traversal of the underlying XML document is performed, and each node of the document is tested for membership of U. If the node is present in U, then it is part of the node set and so it can be processed. This technique is straightforward. However, its execution speed is O(N), where N is the number of nodes in the underlying document (or a typically large subtree of the document). This makes the technique inefficient for small node sets in large documents.
A second known technique starts by storing the nodes of the node set in an unordered set U. This collection is then sorted using a standard sorting algorithm and a document-order comparison function. For example, the heapsort algorithm can sort n items in time O(n*log(n)). Then, iterate over the resulting sequence S. If a constant-time comparison function is available (this is rare, and typically only efficient for static XML documents with specialised parsing tools), then execution time will be O(M*log(M)), where M is the number of nodes in the node set. More typically, the comparison function will be logarithmic in the size of the XML document, and so execution time will be O(M*log(M)*log(N)).
In a third technique, the nodes of the node set are stored in an ordered set S, for example, a binary tree-based set of n items supports search and update in time O(log(n)). The document order of the nodes is maintained from the outset, so iteration is simply a case of sequencing through S. Although this appears attractive, providing iteration performance of O(M), it is not so effective in practice. In many instances, node sets are created by performing a sequence of manipulations on an initial node set that consists of the whole or a large part of an XML document. In such cases, processing time will be linear in the size of the initial, large node set. Furthermore, many of the manipulations cannot be performed efficiently on the ordered set, resulting in yet poorer performance.
Consider the XML document depicted in FIG. 2 below, and a node set consisting of just the emboldened nodes. A document-order traversal of the node set using a method that is linear in the size of the XML document will involve 24 node test computations (one for each node in the document), although the node set consists of only 10 nodes. A traversal using one of the sorting techniques may involve up to 150 tests (10*log(10)*log(24)).