The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
The XML (extensible Markup Language) specification established by the W3C Organization provides a standardized methodology for exchanging structured data between different mechanisms. The different mechanisms may be different components within the same system (e.g. different program components) or they may be completely separate systems (e.g. systems of different companies, or different servers on the World Wide Web). Basically, XML allows structured data to be exchanged in a textual format using “element tags” to specify structure and to delimit different sets of data.
XML documents comprise one or more elements determined by tags. Elements may be nested within other elements to give rise to a hierarchical structure, which defines the structural relationships between the various elements. Because they specify a hierarchical structure, XML documents lend themselves to being represented by tree-type representations. In fact, many XML documents are not processed directly but rather are first parsed and transformed into tree representations, and then processed using the tree representations.
An XML document may be represented using any type of tree structure, but one tree structure that is commonly used is the one provided by the document object model (DOM). More specifically, an XML document may be parsed into objects using the DOM, and the resulting DOM tree represents the parsed document as a node tree with a plurality of nodes. The DOM provides a rich tree representation for the XML document. Given any node on the tree, the DOM tree representation provides all information pertinent to that node. For example, the DOM tree provides information as to which node is the parent of that node, which nodes are children of that node, and which nodes are siblings of that node. Given this information, it can be easily determined where a particular node fits within the XML document. More information and a specification for the DOM can be found on the W3C website at www.w3c.org.
Whenever storing XML documents, there is a frequent need for detecting changes between versions of documents. Thus, a common need is determining the changes that have occurred to a document since a previous version of a document.
A common differencing (i.e. “diffing”) tool is one provided by UNIX. While UNIX diff works sufficiently for normal text documents, it does not understand the tree structure involved in XML. Therefore, it can not find a minimal and correct (in the sense of XML) edit script between two XML documents. An edit script specifies transformation operations that, when applied to an input document, transform the input document into an output document. A minimal edit script is one in which there are no other edit scripts in the set of all possible edit scripts with less operations that, when applied to the same input document, produce the same output document.
Current approaches for diffing XML documents use too many resources and are relatively slow. Therefore, there is a need to provide a more efficient way for diffing two XML documents.
Once an edit script is generated, the edit script may be applied to a XML document (i.e. “patching a document”) in order to generate a patched version of the XML document. If an operation (e.g. insert, delete, rename) in the edit script identifies a node in the (un-patched) XML document, then the operation is performed and the patched XML document will reflect the modification identified in the operation.
Current approaches for patching an XML document require a significant amount of resources. Therefore, there is a need to more efficiently patch an XML document.