Extensible Markup Language (XML) has been proposed as a standard language for information exchange on the Internet. It is being widely used in computers and network systems and also in many other industrial fields, such as biological information, electronic commerce, electronic data interchange, geographic information and global positioning systems, multimedia, entertainment, etc.
XML documents contain information formatted as a collection of records. Unlike a relational database system, XML documents are not indexed and therefore it is more difficult to perform some operations on XML (or multiple XML) documents. One such operation is a join operation that selects various records from two or more XML documents based on various selection criteria (e.g., criteria such as a common product identifier used across multiple XML documents).
A join between XML documents can be resource intensive for a number of reasons. For example, selecting records from an XML document may require parsing the document multiple times. In addition, performing an XML join can require the creation of an in memory model by selecting a set of records involved in the join and then applying the join logic on the in memory model.
The following example illustrates what is required to accomplish an XML join using XQuery 1.0 (XQuery 1.0 is an XML query language published by the World Wide Web Consortium (W3C)). The XQuery statement is listed below, and it uses two of the example XML documents listed in the Detailed Description.
import schema namespace bib =“http://www.xquark.org/XQuery/use-cases/bib” at“bib.xsd”import schema namespace rev =“http://www.xquark.org/XQuery/use-cases/reviews” at“reviews.xsd”<books-with-prices> {  for $b in doc(“bib.xml”)/bib:bib//book,   $a in doc(“reviews.xml”)/rev:reviews//entry  where $b/title = $a/title  order by $b/title  return   <book-with-prices>    { $b/title }    <price-amazon>{ $a/price/text( ) }</price-amazon>    <price-bn>{ $b/price/text( ) }</price-bn>   </book-with-prices> }</books-with-prices>
The above example XQuery statement requires a join be performed across XML documents BIB.xml and Reviews.xml. XQuery returns the values $a/price/text( ), $b/price/text( ), $b/title (variable $b stands for bib//book, and variable $a stands for reviews//entry). In order to execute this example query, a query plan must be created. Using BEA AquaLogic (AquaLogic is a product of BEA Systems, Inc.; BEA AquaLogic is a trademark of BEA Systems, Inc.), the query plan depicted in FIG. 8 was generated. The query plan 800 represents the above example XQuery statement. As the query plan 800 illustrates, a collection of “book” nodes and “entry” nodes are selected from BIB.xml and Reviews.xml respectively. The “for” statement in the query plan 800 represents selection of multiple nodes. For every element in the collection individual element values are stored in temporary variables for example variables I393 and I397 are used to store values of “title” and “price” elements for every individual “entry” element in the collection. A top level XML return block is defined where individual XML tags are mapped to the variables defined in the previous stage. For instance, in the query plan 800 the XML return tags <title>, <price-amazon>, and <price-bn> are mapped to variables I396, I397, and I398 respectively. Where I396 maps to <Path>$book/child::title</Path>, I397 maps to <Path>$entry/child::price/child::text( )</Path>, and I398 maps to <Path>$book/child::price/child::text( )</Path>. Each element in one collection is compared to every element in the other collection and when ever the equality condition (I393=I394) is satisfied for the assigned variables the XML return type is populated using the variables. This step results in a collection of XML return type values which are rearranged according to variable I395 (order by clause) corresponding to <Path>$book/child::title</Path>.
According to the query plan 800, in order to perform the join, a collection of all the elements that will be used in query will be fetched into memory beforehand. In the situation where a join is made between documents at different locations (e.g., different geographical locations or different locations on a network), the join will require transfer of all the elements from one document location to the other, which might require significant bandwidth.
The following is an example of the number of tags and values that would have to be moved from the location of the BOM.xml document (Example 2 in the Detailed Description below) to the location of the Partnumberdata.xml document (Example 4 in the Detailed Description below) to perform a join between the two documents (Example 6 in the Detailed Description below) at the location of Partnumberdata.xml.
<part>(1) (2)<partid>(3)item1</partid>(4) (5)<supart> (6)<subpartid>(7)subpart1</subpartid>(8) (9)<consumedquantity>(10)2</consumedquantity>(11) (12)</supart></part>(13)<part>(14) (15)<partid>(16)item2</partid>(17) (18)<supart> (19)<subpartid>(20)subpart1</subpartid>(21) (22)<consumedquantity>(23)10</consumedquantity>(24) (25)</supart></part>(26)<part>(27) (28)<partid>(29)item4</partid>(30) (31)<supart> (32)<subpartid>(33)subpart4</subpartid>(34) (35)<consumedquantity>(36)4</consumedquantity>(37) (38)</supart></part>(39)
As can be seen, 39 tags and values would need to be moved in this case. If, instead, the join were to be performed at the location of the BOM.xml document, then the tags and values from the Partnumberdata.xml file would have to be moved from the location of Partnumberdata.xml to the location of the BOM.xml document. This would require moving the following tags and values (from the location of Partnumberdata.xml to the location of BOM.xml):
<part>(1) (2)<subpartid>(3)subpart1</subpartid>(4) <supplier>(5)  (6)<supplierid>(7)krish</supplierid>(8) </supplier>(9)</part>(10)<part>(11) (12)<subpartid>(13)subpart2</subpartid>(14) <supplier>(15)  (16)<supplierid>(17)krish</supplierid>(18) </supplier>(19)</part>(20)<part>(21) (22)<subpartid>subpart3(23)</subpartid>(24) <supplier>(25)  (26)<supplierid>(27)krish</supplierid>(28) </supplier>(29) <supplier>(30)  (31)<supplierid>(32)mohit</supplierid>(33) </supplier>(34) <supplier>(35)  (36)<supplierid>(37)Sriram</supplierid>(38) </supplier>(39)</part>(40)<part>(41) (42)<subpartid>subpart4(42)</subpartid>(44) <supplier>(45)  (46)<supplierid>mohit(47)</supplierid>(48) </supplier>(49)</part>(50)
In this case, 50 tags and values would need to be moved. Obviously, the more efficient of these two options would be the first, as only 39 tags and values would need to be moved. While 39 tags and values is less than 50 tags and values, these two situations are still inefficient because the whole XML structure for all elements of the query need to move one way or the other.
Therefore, there exists ample opportunity for improvement in technologies related to implementing more efficient XML joins.