1. Field of the Invention
The present invention relates to index data structures and index processing methods, in particular, useful in indexing data objects relating to XML documents.
2. Background Art
XQuery is a query language that is designed to query collections of XML data. It is semantically similar to Structured Query Language (SQL). XQuery provides the means to extract and manipulate data from XML documents or any data source that can be viewed as XML, such as relational databases or office documents. XQuery uses XPath expression syntax to address specific parts of an XML document. It supplements this with a SQL-like “FLWOR expression” for performing joins. A FLWOR expression is constructed from the five clauses after which it is named: FOR, LET, WHERE, ORDER BY, RETURN. XPath is a language for selecting nodes from an XML document.
Much research has been done adapting relational technology for use with XML and XPath query processing. Several research efforts have focused on native XML databases and other research efforts have focused on hybrid approaches. Efficient XPath query processing is important because XPath is the query language used for node selection within XQuery.
Performance improvements in relational technology have been made continuously over many years. Relational technology has largely been focused on exploiting B+Tree indexes; so there is a large foundation to build upon. The hierarchical data structure of XML, coupled with the path orientation of the query languages XPath and XQuery introduce new challenges into the technology mix. XPath query performance is particularly important, as it is a core component of the XQuery language. Optimizing XPath and XQuery query performance has involved extensive research into various subject areas such as: adding new indexes, adding statistical decision making around plan optimization, adding rule based optimization, improving storage layout, and many other features, in particular, partitioned indexes.
Relational Database Management Systems (RDBMS) have a long proven history of success with a very large installed base and with large investment support. The semi-structured data content and hierarchical tree structure of an XML document, in general, previously was not thought to fit well into the relational model. Therefore, there has been much effort placed on protecting and reusing existing relational technology for XPath and XML. There are many approaches that provide XPath axes support by fitting the problem into existing RDBMS architectures, either by encoding limited additional data into the B+Tree index structure or by encoding the XML structure into relational tables. The use of structure summaries is widely adopted. Significant progress has been made toward addressing the challenges of relational algebra when handling recursive closure, accomplished by encoding tree structures. The advantage of these solutions is that existing relational database management systems can be used without modification.