1. Field of the Invention
This invention relates to XML pivot joins and more particularly relates to handling a LET binding used in a WHERE clause of an XQuery FLWOR expression during an XML pivot join procedure.
2. Description of the Related Art
XPath and XQuery are two common languages used to query an XML document. XPath is a path expression language for selecting data within XML documents. XQuery is a language for querying, transforming, and constructing XML data. An expression is a string of unicode characters which may be constructed from keywords, symbols, and operands. XPath allows expressions to be nested. XQuery uses XPath expression syntax to address specific parts of an XML document and is semantically similar to structured query language (SQL). The SQL-like XQuery syntax uses “For, Like, Where, Order by, and Return” clauses in a “FLWOR” expression.
XPath analyzes an XML document as an XML tree by representing each element of the XML document as a node in the XML tree. The XML tree may include parent-child nodes that directly correspond to the nested elements in the XML document. For more information regarding XPath and XQuery please visit their standards web pages which currently reside at http://www.w3.org/TR/xpath20/ and http://www.w3.org/TR/xquery/ respectively.
An XML pivot join procedure provides efficient filtering of XML documents that satisfy an XPath or XQuery expression. The XML pivot join procedure uses an XML index scan to filter an index for each leg in an expression. For example, given an XPath expression, “/a/b[(c=5) AND (d=6)],” and a collection of XML documents, an index will be filtered by performing an XML index scan on the index relating to “/a/b/c=5” (the first leg) as well as on the index relating to “/a/b/d=6” (the second leg). Each index will contain information from a qualifying XML document, which information includes the qualifying path, the XML document where the path is located, the node identification of the path, and the value received from the XML document. The XML pivot join procedure will “AND” the index scans thereby advancing the scan of one index based on the information of another.
Several structures are created during the XML pivot join procedure. From the query, a query tree is generated that describes the query in tree representation. Also, a paths table is created during the XML pivot join procedure to describe every unique path in the collection of XML documents. By nature, the paths table includes paths that are both relevant and non-relevant to the query. So, to summarize the relevant paths, a paths tree is created. Entries from the paths table are matched against the query tree and qualifying paths are combined to form the paths tree. A match graph is constructed by finding paths in the paths tree that match steps in the query tree. These structures are used at strategic points throughout the XML pivot join algorithm to identify qualifying documents.
XML pivot joining from an XPath expression produces expected results. However, XML pivot joining from an XQuery FLWOR expression may not produce expected results. That is, when pivot joining from an XQuery FLWOR expression, the information to be propagated may not compute properly. For example, given the XQuery FLWOR expression “FOR $a in doc( )//a LET $b in $a//b WHERE $b/c=5 and $b/d=6 RETURN $a,” the LET binding $b describes the sequence of one or more “b” nodes underneath “a” nodes of a document. So, the XQuery FLWOR expression searches for “c” nodes and “d” nodes under any of the “b” nodes that are under “a.” In other words, the “c=5” match and the “d=6” match does not necessarily need to be under the same “b” node.
Detrimentally, certain XML documents that, in theory, should satisfy the XQuery FLWOR expression do not. When the XML pivot join algorithm applies the “AND” operation on the indexes, evaluation of the subsequent leg of the expression relies on the information obtained from the evaluation of the prior leg. So, if the information obtained from evaluating the prior leg results in “Document 1” satisfying the “/a/b/c=5” path, then the subsequent expression would rely on that information to find the “/a/b/d=6” path by starting at the node location for the path “/a/b.” This necessitates having the “c=5” match and the “d=6” match under the same “b” node. But, when evaluating an XQuery FLWOR expression, the “c=5” match and the “d=6” match are not required to be located under the same “b” node. The “c=5” match and the “d=6” match can be under different parent nodes to qualify a document; however, current implementations of the XML pivot join procedure exclude the document.
FIGS. 3 through 10 are provided as an example of the XML pivot join procedure. Using the XML documents in FIG. 3 and an XPath query: //x[.//v[b=“b” and c=“c”] and .//a=“a”], a query tree (see FIG. 4) is constructed. A double bar represents a descendant axis and a single bar represents a child axis. A paths table (see FIG. 4) is also constructed. The paths table describes all the unique paths within the collection of XML documents in FIG. 3. To summarize all of the paths relevant to the query, a paths tree is created. The query tree facilitates parsing the XPath query into linear XPaths. The linear XPaths derived are //x[.//v[b=“b”]], //x[.//v[c=“c”]], and //x[.//a=“a”].
Paths in the paths table matching the linear XPaths derived from the query tree comprise the paths tree. Thus, the paths tree summarizes all the unique paths in the collection of XML documents that are relevant to the XPath query. XML index entries are created for each linear XPath in the XPath query. Each index entry includes a path, which is the unique path that matched the linear XPath; a value, which is the value of the last document node in the path; a document identifier, which is the document identifier of the XML documents that contains the path; and a node identifier, which is the identifier of the node in the XML document that is in the path.
FIG. 5 depicts the XML index entry's relations to the XML documents and paths of the paths tree. For the first entry, the path z-e-x-p-v-b describes the first matching (unique) path for the linear XPath //x/v/b. For discussion purposes, the labels within the path with numbers have been subscripted in FIG. 5 to show the unique instances of that label. For the linear XPath //x/v/b for example, there are a number of matching paths: z-e-x-p-v-b is the first matching path, so it is subscripted with “1”; the paths tree has the z-e-x-p-v-b path so it is subscripted as z-e-x1-p1-v1-b1. The second match for //x/v/b is z-e-x-q-v-b, so it is subscripted as z-e-x1-q1-v2-b2. The “v2” indicates that this path is the second unique path for “v,” and “b2” indicates that this path is the second unique path for “b.”
From this point on, the paths that match a linear XPath will be referenced using the last subscripted label. For example, to refer to the path z-e-x1-p1-v1-b1 above, the reference will be to the “b1” path. Likewise, for the path z-e-x1-q1-v2-b2, the reference will be to the “b2” path.
The first entry in the XML index in FIG. 5 indicates that the “b1” path has the value “b” at document 1 at nodeID 1.1.1.2.1.1.1. An explanation of nodeIDs may begin at document 1. Node “z” has nodeID 1.1. Node “e” has nodeID 1.1.1, node “x” has 1.1.1.2 (because node “aa” is 1.1.1.1). Node “p” has 1.1.1.2.1, node “v” has 1.1.1.2.1.1 and node “b” has 0.1.1.2.1.1.1. Note that nodeIDs are ordered, that is 1.1.1<1.1.2<1.1.2.1 and so on, and that parent nodeIDs are easily computed from any descendant. That is from “b,” which has nodeID 1.1.1.2.1.1.1, the nodeID of “x1” can be computed by truncating the nodeID from 7 digits to 4 digits (7 nodes along the path to the “b” document node, and 4 nodes along the path to the “x” document node). So, the nodeID of “x1” is 1.1.1.2.
Note that for the “b1” path, there are a number of XML index entries. The first “b1” entry points to document 1, nodeID 1.1.1.2.1.1.1. The second “b1” entry points to document 2 nodeID 1.1.1.1.1.1.1. This says that the indexes have found the “b1” path in both document 1 and document 2. From here on, instead of showing the XML index entries and their paths and locations, the subscripted paths along the xml document paths will be shown (see FIG. 6).
In FIG. 6, the matches for the linear XPath //x//v/b can easily be seen. There is a path “b1” at document 1, “b3” at document 1, “b1” at document 2, and so on. For the linear XPath //x//v/c, there is a path “c1” at document 1, “c1” at document 1, “c1” at document 2, and so on. For the linear XPath //x//a, there is a path “a2” at document 2, “a2” at document 3, “a4” at document 4, and so on. Further referencing the way the algorithm advances the XML index scans will occur by saying that the “b1” scan is currently at the first “b1” in document 1, then at the first “b1” at document 2 and so on. The XML pivot join procedure has one index scan open for each unique path. So, the “b1” paths use one index scan, the “c1” paths use one index scan, the “b2” paths uses one index scan, and so on.
In FIG. 7, a query tree and a paths tree are used to construct a match graph. The match graph is constructed by finding matching paths in the paths tree with steps in the query tree. The “b1” node in the match graph, for example, signifies the match between the “b1” path in the paths tree with the “b” step in the query tree. The match graph is used to remember document and node locations while performing the XML index scans. For example, if the index scan for “b1” is advanced and the index scan returns document 1, the location will be remembered, document 1 (and the nodeID of the “b1” match) in the match graph node “b1.” The match graph node “b1” is at location document 1.
A running example may be useful as it shows the XML pivot join procedure in detail. The following examples will show snapshots of the match graph and describe how the document locations are computed and how the index scans are advanced. The diagram of the xml documents in FIG. 6 will be used repetitively to help track what is being pointed to with the XML index scans.
In FIG. 8, the progress of the match graph occurs from left to right. The leaves of the match graph correspond to the index scans. For the initial match graph on the left of FIG. 8, the “b1” index scan is at “doc1.” The “c1” index scan is also at “doc1.” The “b2” index scan is at “doc2,” the “c2” index scan is at “doc2,” and the “a1” index scan is at “doc5.” The locations for the index scans show that the first “b1” path is at document 1, the first “c1” path is at document 1, the first “b2” path is at document 2, the first “c2” path is at document 2, the first “a1” path is at document 5, and so on.
In the match graph snapshot to the right, the location of “b1” has been truncated to the level of “v1.” It is apparent that this matched the nodeID computed from the truncation of the nodeID of “c1” to the level of “v1.” This match is depicted in location with an asterisk (*) at “v1.” This says that a “b1” path and a “c1” path are found to have the same “v1” ancestor at document 1, as can be seen in FIG. 6. A similar occurrence is seen for “v2.” To compute for the location of “x1”, the minimum locations of (a1, a2) and minimum locations of (v1, v2) are taken, then the maximum of these minimums is taken. So “x1” is at doc2. To compute for the location of “z,” take the minimum of (x1 and x2), so “z” is at doc2. Note that neither “x1” nor “z” has the asterisk because “x1” does not have a “v” match and an “a” match under the same “x” match.
Now that all the index cursors at the leaves of the Match Graph have been advanced once (without returning results), the cursors may again be advanced. The initial match graph (the match graph on the left) in FIG. 9 now shows that the “b1” index cursor has been advanced to doc3 based on the maximums of the ancestor matches. In FIG. 9, doc2+ depicts the fact that the XML pivot join procedure advanced the “z” location to some location just above doc2, so the cursors on the leaves can be advanced beyond its previous location. In other words, the location of the “z” match needs to be advanced beyond doc2. By advancing “b1” to doc3, the subsequent match graph (the one on the right) in FIG. 9 is realized and the ability to compute the location of “x1” based on max(min(v1, v2), min(a1, a2)) is achieved. Here we see that there is an “a2,” a “b2,” and a “c2,” at doc2 that have the same “x1” ancestor, while “b2” and “c2” have the same “v2” ancestor.
So, now “x1” has the asterisk (*). The location of “z” is computed from min(x1, x2) and the doc2 for “z” is received. A result for the XPath //x[.//v[b and c] and .//a] may be returned because a “b2” and a “c2” under the same “v1” has been found and the “v1” and “a2” have the same “x1” ancestor. So, the first match for the query //x[.//v[b and c] and .//a] is document 2.
Advancing the location of “z” to doc2+ to advance the index scans is required. This time a “b4” and a “c4” is found at document 5 with the same “v4,” and an “a3” at document 5 with the same “x2” as “b4.” So, document 5 can be returned as a match for the query //x[.//v[b and c] and .//a]. This example of the XML pivot join procedure was applied to an XPath query. The XML pivot join procedure produces expected results when applied to an XPath query. However, as noted above, desired results are not produced when the XML pivot join procedure handles a LET binding used in a WHERE clause of an XQuery FLWOR expression.
Using the collection of XML documents in FIG. 6 and an XQuery FLWOR expression: “for $x in doc( ) //x LET $v in $x //v WHERE $v/b=“b” and $v/c=“c” and $x//a=“a” RETURN $x;”, the undesired results will be apparent. The expression comprises a $v binding which is a LET binding. The expression “says” that for the predicate “$v/b=‘b’ and $v/c=‘c’” the “b” and “c” matches are under a sequence of one or more “v” matches. Document 3 in FIG. 6 should now qualify because “b1” and “c1” are under a sequence of “v1” nodes under an “x1,” and “a2” is under the same “x1.” When applied to the XPath query, the XML pivot join procedure properly skipped Document 3 because “b1” and “c1” for Document 3 are not under the same “v1” node. However, when applied to the XQuery FLWOR expression, Document 3 is improperly skipped.
From the foregoing discussion, Applicants assert that a need exists for a method and apparatus that properly handles a LET binding used in a WHERE clause of an XQuery FLWOR expression during an XML pivot join procedure.