XML is now in widespread use as a storage or exchange format for documents and data. Such documents and data generally undergo changes and these changes need to be monitored and actions taken based on what changes have been made. There are many implementations of XML comparison technology including that described in EP1325432 which is an earlier development by the same inventor, the content of which is incorporated by reference, and others including Altova Spy DiffDog™, Microsoft™ Diff and Patch, IBM Alphaworks™, XMLcmp™, Versim™. Generally these take two documents and generate some representation of the differences between them, often referred to as a ‘delta’ file.
There is another set of problems specifically relating to representing changes between three or more documents. For example, when a document is edited simultaneously by two different editors then the result of the two edits and the original common base document represent three documents that need to be compared in order to resolve any differences or conflicts between the edits. Another example is a document that has been translated, and typically the translated document has the same structure as the original but with text in a different language. When the original is modified, it is necessary to look at the changes between the three existing documents (i.e. the original, the translated original and the modified original) in order to enable a new version of the translation to be produced.
Application areas for more than three documents include extensions of the above, for example a document edited by three or more editors simultaneously or the update of more than one translation. There is also the common situation of a ‘travelling draft’ where a document is edited in succession (or simultaneously) by two or more editors as part of a contract negotiation or development of a narrative.
Solutions in these areas are considerably more complex than those involving two documents. For two documents there are limited changes for any particular node, where a node is an attribute, text item or element/subtree. These are as follows:                1. The nodes are equal        2. The nodes are not equal        3. The node is in only one of the two documents        
For three or more documents the number of change possibilities increases considerably, because a node in each document may be equal to a corresponding node in any one or more of the other documents. Most of the current approaches to combining three or more XML documents into one are based on the requirements of version control systems. The two key criteria for such systems are to minimize storage space and to minimize the time taken to retrieve a specified version from an archive. Version control systems are not useful as part of a solution to the above problems because the representation of the differences is designed only to re-construct one particular version, rather than for general processing of the differences.
One proposal for a format for a multiple version document is DeltaXML Unified Delta™, described in “Russian Dolls and XML: Handling Multiple Versions of XML in XML” XML 2003, December, 2003, USA. This is more suited to processing of the changes between versions and a generic solution to the above problems is proposed in “A Generalized Grammar for Three-way XML Synchronization” XML 2005, USA. A study of this will show that although a generic solution is possible, the architecture and execution of this is complex. In particular, a grammar is proposed for specifying the required result based on a rule set for combination. From this rule set, code can be generated to process a Unified Delta™ document to generate the required result.
Implementation of the Unified Delta™ format shows that not only is the code complex but it is also quite slow to execute. One reason for this is that at each point in the subtree hierarchy a deep traversal of the document subtree is necessary in order to determine the relationship of the different document versions within the subtree, for example to determine if the subtree is the same (equal) in all or some of the documents. The present invention seeks to eliminate this problem.
In addition, existing solutions do not cater for extensions to show other forms of relationship between a common element as it may appear in the different documents, for example to indicate if all of the text within the elements is the same, or that they have the same date stamp. The present invention seeks to address this problem by providing a method and apparatus not only to represent different variations of equality relationship but also to ensure that subsets of the Logical Delta™ file are themselves valid.