1. Field of the Invention
The present invention relates to databases and, more particularly, to an index structure for searching XML documents.
2. Description of the Related Art
XML provides a flexible way to define semi-structured data. For instance, purchase records that contain information of buyers and sellers can be described by the document type definition (hereinafter referred to as “DTD”) schema shown in FIG. 1. DTD is a common schema specification method for XML documents. A sample XML document based on this DTD is shown in FIG. 3.
The ability to express complex structural or graphical queries is one of the major focuses in XML query language design. In FIG. 2, four sample queries in graph form are shown. It is well-known in the art that querying XML data is equivalent to finding sub-structures of the data graph that match the query structure.
Many of the current approaches to querying XML data create indexes on paths (e.g., “/P/S/I/M” as in Q1) or nodes in DTD trees. Path indexes can answer simple queries such as Q1 efficiently. However, queries involving branching structures (Q2, for instance) usually have to be disassembled into multiple sub-queries, each sub-query corresponding to a single path in the graph. The results of these sub-queries are then combined by expensive “join” operations to produce final answers. For the same reason, these methods are also inefficient in handling ‘*’ or ‘//’ queries (Q3 and Q4, for instance), which too, correspond to multiple paths. To avoid expensive join operations, some index methods create special index entries for frequently occurring multiple-path queries (commonly referred to as “refined paths”). The potential disadvantages of this approach include: 1) there is a need to monitor query patterns; 2) it is not a general approach because not every branching query is optimized; and 3) the number of refined paths can have a huge impact on the size and the maintenance cost of the index.
Moreover, to retrieve semi-structured data (e.g., XML documents) efficiently, it is essential to index on both structure and content of the XML data. Nevertheless, many algorithms index on structure only, or index on structure and content separately, which means, for instance, attribute values in Q2, Q3, and Q4 are not used for filtering in the most effective way.
Another important aspect to XML indexing is whether the index structure supports dynamic data insertion, deletion, and update, and whether the index depends on specialized data structures not well-supported by database systems.