In recent years, there are many database systems that allow storage and querying of eXtensible Markup Language data (“XML data”). Though there are many evolving standards for querying XML, all of them include some variation of XPath. However, database systems are usually not optimized to handle XPath queries, and the query performance of the database systems leaves much to be desired. In specific cases where an XML schema definition may be available, the structure and data types used in XML instance documents may be known. However, in cases where an XML schema definition is not available, and the documents to be searched do not conform to any schema, there are no efficient techniques for querying using XPath.
Ad-hoc mechanisms, like a full scan of all documents, or text keyword-based indexes, may be used to increase the performance of querying documents when no XML schema definition is available. However, any indexing mechanism used for this purpose has to be maintained in sync when changes occur to the original documents. Typically, such maintenance is performed by deleting all of the indexing information corresponding to all documents that are changes in an operation, and adding an entire new set of indexing information for the newly changed documents. Maintaining indexes in this manner tends to be inefficient and can slow performance.
Based on the foregoing, there is a clear need for a system and method for accessing XML documents efficiently, without incurring the problems associated with ad hoc indexing mechanisms when the XML documents are modified.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.