XML (eXtensible Markup Language) is increasingly being used to represent semi-structured data. In recent years, there are many database systems that allow storage and querying of XML data. Large collections of potentially large XML documents are stored in database systems. Such collections are typically queried using languages such as XPath. However, the standard XPath language is being extended to introduce functions and operators from several other domains, where a domain of information can be defined as semantic-based information that is associated with its own specialized operations and which can be contained in an XML document.
Various domain-specific indexing schemes can be and have been developed to support a set of domain operators and functions. Such schemes can be registered with a database management system and implemented as a set of interface functions.
For an example of a domain function, a contains( ) function within XPath can be used to perform full-text search. The semantics of the contains( ) function embedded within XPath is fundamentally tied to the notion of XML nodes. One variant of the contains( ) function invoked on a complex XML element is evaluated to true if the “virtual” text document formed by concatenating all the text descendants of the specified element contains the specified keyword. There are other variants of this function, which also crucially depend on the understanding of XML nodes and hierarchical relationship between nodes.
To improve performance of standard XPath queries, various indexing strategies have been developed. For example, a Path Table may be populated with certain information about each node in an XML document, and XPath queries rewritten into standard SQL queries against the Path Table. However, these indexing mechanisms only accelerate queries involving XPath with forward axes (child, descendant) and value comparisons. Such mechanisms are not as effective with queries involving functions from other domains, such as text, spatial, life sciences, time series, image, and multimedia domains. Though XML specific indexes can be used to improve the XPath portion of the queries, the domain-specific portions have to be deferred to a slow post-processing step
Even if a separate text index is created on XML documents, there is no mechanism to combine the results, at a node level, from the XPath-based index and the text index. An XPath query can be used to identify a particular node, but XPath has no mechanism to avail of the semantic context within a domain. Further, a domain-based index may facilitate locating and operating upon information within domains contained in an XML document, but there is no mechanism to relate this information to the underlying XML hierarchy in which the domain is contained. Hence, due to relative coarse granularity of results (i.e., results at the XML document level), the value of XML is diminished. For example, a text index may return several instances of a requested keyword from within an XML document, however, there is no way of knowing in which particular text nodes the keyword instances were found. In other words, there is no way of determining whether any hits from the XPath-based index (hits at the node level) match any hits from the text index (hits at the XML document level).
In contrast, one approach is to treat every node as a document, from the viewpoint of a domain-based index. However, with such an approach, if multiple elements are requested from within a domain, the domain index would not return a hit unless all the requested elements are within the same node. This is undesirable in an XML context. That is, if one node contains one of the elements and a sibling node contains another of the elements, then it is desirable that a hit is returned for the parent node of those two nodes because the parent node “contains” both elements.
With all prior approaches to XPath queries that include domain-based operators, only the following can occur: (1) you get hits for XPath satisfied OR domain-based operator satisfied, but not hits indicating that both are satisfied; or (2) you get hits at the document level, indicating that a particular XML document satisfies the XPath AND satisfies the domain-based operator. Significantly, neither of these results indicates that a particular node(s) satisfies the XPath and the domain-based operator.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.