XML is the standard language for representing and exchanging semi structured data in many commercial and scientific applications and much research has been undertaken on providing flexible indexing and query mechanisms to extract data from XML documents.
FIG. 1 is a sample XML document which describes a project hierarchy in the form of a tree structure. Due to the semi-structured nature of XML, XML documents are often modeled by tree hierarchies. For the same reason, the design of XML query languages also focuses on the ability to express complex structured queries. Such queries can also be modeled, in much the same way, by tree patterns. Thus, query answering becomes a process of finding embedded sub tree structures among a set of data tree structures.
Generally, the purpose of XML indexing is to provide efficient support for structured queries, which may contain values, wild-cards (‘*’ and ‘//’), tree patterns, etc. However, in most indexing solutions, the tree pattern is not a first class citizen or, the most basic query unit. As a result, structured queries cannot be handled directly, and the most commonly supported query interface is instead:
Simple Paths  P (Node Ids)
where P(S) stands for the power set of S.
That is, given a path, the index returns a set of nodes that represent such a path. Some index methods extend the above interface to support relative paths that start with a ‘*’ or “//” at the cost of building a much larger index.
Tree patterns are not the most basic query unit because one normally cannot afford to maintain a separate index entry for each possible tree pattern, especially when they contain attribute values, wildcards ‘*’ or ‘//’. Instead, one normally disassembles a tree pattern into a set of simple path queries. Then, one uses join operations to merge their results to answer the original query. To avoid expensive join operations for queries that occur frequently, some index methods create special index entries for a limited set of path templates.
In view of the foregoing, it has been found that conventional approaches to tree structure indexing via XML suffer from shortcomings and inefficiencies that warrant improvement. A need has accordingly been recognized in connection with effecting such improvement.