XML (i.e. eXtensible Markup Language) is becoming the standard format for representing semi-structured data which are then stored and managed within database systems. All the major database systems are being extended to natively support XML data. Standard query languages such as XPath and XQuery are typically used to query these document collections. Specialized index structures, such as an XML index, have been developed to improve the performance of query operations. An XML index stores entries corresponding to all the nodes of a set of indexed XML documents. Each entry tracks information about the corresponding node, such as the path to the node, the value of the node (if the node is a leaf node) and some information about the hierarchical position of the node (such as a dewey style order key). Additional secondary indexes may also be created to improve lookup of entries based on path, value, etc.
However, creating an XML index adds significant overhead to the cost of inserting and updating documents within the system. When a new document is inserted, or an existing document is updated, entries corresponding to all the affected nodes of the document need to be generated and inserted into the index structure. In the case of an update operation, all the entries corresponding to the nodes of the old document need to be deleted and entries corresponding to the nodes of the updated document need to be added. Further, any secondary indexes need to be suitably updated.
To ensure that the index contains accurate information, changes made to the index and any secondary indexes are performed in the same atomic transaction as the change to the underlying data. In other words, the index and the underlying data are maintained synchronously. These index maintenance operations could result in a significant slowdown of the overall insert/update operation. This problem is especially critical for high-throughput and low latency requirements of On-Line Transaction Processing (OLTP) applications.
This presents a significant dilemma. On the one hand, although significant overhead is incurred, the index must be updated in order to be useful. On the other hand, simply eliminating the index is not suitable because doing so would severely degrade query performance.
Based on the foregoing, there is a clear need to provide a mechanism for retaining the benefits of an XML index without incurring significant overhead during inserts and updates of documents.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.