The eXtensible Markup Language (XML) is an emerging standard for the representation and exchange of data, such as data transmitted or received over the Internet. An XML document is a tree-structured, self-descriptive document having a collection of nodes, and that facilitates some form of automatic semantic integration. The XML standard is governed by the World Wide Web Consortium, and is maintained on the Internet at the web site of the World Wide Web Consortium (W3C).
XML documents are sometimes differenced or merged. Differencing two XML documents refers to determining the data that distinguishes one XML document from another XML document. Such data may include data that is found in the former document but not in the latter document, which may be referred to as deletions, as well as data that is found in the latter document but not in the former document, which may be referred to as additions.
By comparison, merging two XML documents adds to a first XML document the data in a second XML document that is not found in the first XML document. Therefore, merging two XML documents is a special case of differencing the XML documents. The data found in the second XML document that is not found in the first XML document during the differencing operation is added to the first XML document in the merging operation.
Prior art approaches for differencing and merging XML documents typically focus on generalized processes that can be employed with any set of XML documents. As such, these approaches may use different heuristics for each different type, or schema, of XML documents, or for each different purpose for which the XML documents are being used. Because of their generalized, heuristic-oriented nature, such prior art approaches may not difference and merge XML documents in the way a user may expect.
For example, an XML document may be different from another XML document based on the layout, or type, of information it contains, and/or based on the actual data, or information, it contains. One type of approach to differencing and merging may use a heuristic that focuses on the layout of information, whereas another type of approach to differencing and merging may use a heuristic that focus on the actual data contained within the layout. Because the user has no control over the type of heuristic a given differencing-and-merging approach employs, the approach may yield results that are undesirable to the user.
More generally, it is difficult to predict whether or not a given generalized and heuristic-oriented approach to XML document differencing and merging will operate as expected by users. As such, developing differencing and merging tools that satisfy user expectations has been problematic. For these and other reasons, there is a need for the present invention.