A document (for example, an extensible markup language (XML) document) may be represented by a tree with nodes. Each node may store or represent labels or data elements of the document and multiple nodes may be connected by edges defining relationships therebetween. A query may be used to search the document by finding nodes in the tree representation that form a predefined pattern. The query may be referred to as a “twig pattern.” The query may search for all occurrences of the twig pattern in the larger document tree. A search result is found when (1) nodes in the twig pattern match nodes in the larger document tree and (2) the relationship between nodes in the twig pattern match the relationship between nodes in the larger document tree.
However, as the size of the document increases, the memory used to store and search a tree representation may grow exponentially.
To simplify the search, reduce memory usage, and increase search speed, the document tree may be simplified into a plurality of linear sequences referred to as “streams.” A processing thread may search through each stream, as a whole, for elements matching the query twig pattern.
For simplicity and clarity of illustration, elements shown in the drawings have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity. Further, where considered appropriate, reference numerals may be repeated among the drawings to indicate corresponding or analogous elements throughout the serial views. Moreover, some elements depicted in the drawings may be combined into a single element or divided into multiple elements.