XML is a versatile markup language, capable of labeling the information content of diverse data sources including structured and semi-structured documents, relational databases, and object repositories. As increasing amounts of information are stored, exchanged, and presented using XML, the ability to intelligently query XML data sources becomes increasingly important. One of the great strengths of XML is its flexibility in representing many different kinds of information from diverse sources. To exploit this flexibility, an XML query language must provide features for retrieving and interpreting information from these diverse sources. A query language that uses the structure of XML intelligently can express queries across all these kinds of data, whether physically stored in XML or viewed as XML via middleware.
A query language called XQuery, is designed to be broadly applicable across many types of XML data sources. XQuery is designed to meet the requirements identified by the W3C XML Query Working Group. It is designed to be a language in which queries are concise and easily understood. It is also flexible enough to query a broad spectrum of XML information sources, including both databases and documents.
XPath is the W3C recommendation for navigating XML documents. Xpath is a search and extraction language designed to be embedded in a host XML language such as XQuery, XSLT and SQL/XML. Xpath expressions often define complicated navigation, resulting in expensive query processing, especially when executed over large collections of documents. As a result, indexes are critical for performance and scalability. However, XML indexes may take large storage spaces, and their maintenance is computationally expensive. When XML indexes are too complex, they will negatively impact system performance. Therefore simple and efficient XML indexes are preferred.
One type of XML index is the XML value index, which is created by specifying an XPath pattern, such as /catalog/category/product/description, or /catalog//description. The XPath patterns may be limited to XPath path expressions without predicates, i.e. a single-path tree in the XPath tree representation. The index entries contain associations of a typed key value from XML nodes, identified by the XPath pattern, to node identities (DocID and NodeID) and record IDs (RIDs) of the XML nodes in the storage. Note that a single value can have many nodes corresponding to it unless the index is unique. Index entries can be organized in a traditional B+ tree index. The search on the index is using the key values only, and provides mapping from a value to node identities (DocID, NodeID) and RIDs.
An XML database usually receives many diverse XML queries. XQuery queries can be decomposed into basic XPath queries, so it is useful to only focus on XPath queries. When an XPath query uses an index, there can be two cases: exact match or inexact match. In an exact match, the XPath query matches the XPath pattern of the index and the index provides the exact result for the query predicate. In an inexact match, the index contains (more than) the result of the query predicate. For example, the index pattern may contain a descendant axis while the query does not. If we were to allow XPath queries to use only exactly matched XML value indexes, we would require an XML database to create too many XML value indexes. To limit the number of indexes created, it is important to use XML value indexes that may contain more than the results of queries, i.e. use indexes for more specific queries.
There are various existing approaches in solving the problem of using indexes for inexact matching queries. One approach is to create more value indexes that will match queries exactly. This approach, however, is not feasible as there are too many queries for indexes to cover. Another approach is to create more index types, such as Path indexes, Path-value indexes, which includes more path information in the indexes that can be used to check equivalence relationship from containment relationship (inexact match). As pointed out above, including more information in indexes will use more storage, and cost more in maintenance.
Accordingly, there is a need for systems and methods for increasing the efficiency of the processing of Xpath and XQuery queries. There is also a need for a method to efficiently use XML value indexes for XPath and XQuery queries that do not exactly match with the index XPath patterns in a way that does not take large storage space and which does not computationally expensive maintenance.