1. Field of the Invention
The present invention relates to a method of managing changes to or between markup language files having application-defined markup tags and to software for performing the method such that the changes may be validated, tracked and recorded. More particularly the present invention is suited for use, but not exclusively, with Extensible Markup Language (XML) documents/files.
2. Description of Related Art
XML has been established as a standard for encoding information in electronic files for transfer between systems and across the world-wide-web (www). In particular, XML is used for the transfer of structured documents and data. Thus, XML is a set of rules, guidelines and conventions for designing text formats for data. Like HyperText Markup Language (HTML), XML is a markup language and so makes use of tags and attributes but while HTML specifies what each tag and attribute means XML uses the tags only to delimit pieces of data, and leaves the interpretation of the data completely to the application that uses it, hence the term application-defined tags. XML encodes, using these markup tags, a description of a document's storage layout and logical structure. Thus markup takes the form of start-tags, end-tags, empty-element tags, entity references, character references, comments, character data (CDATA) section delimiters, document type declarations and processing instructions. The markup tags are nested in a tree structure with the allowed structure for each tag, for example which tags may appear within a tag, formally defined in an element type declaration. Thus the structure of an XML file or document is specified by a set of element type declarations which is known as a Document Type Definition (DTD). The DTD is thus a grammar for the XML document. XML files may be large (tens of megabytes) and complex (over a thousand element type declarations within a DTD) especially where the XML file contains computer-aided design (CAD) data. Hence, a file recording changes to an XML file can also tend to be large and complex. “Extensible Markup Language (XML) I-O W3C Recommendation 10 Feb. 1998” provides a much more detailed explanation of XML and its contents is incorporated herein by reference.
It is often necessary, for example, in software testing and change control support, to determine the exact differences, if any, between two markup language files and to encode these in some way preferably in a separate document that will be referred to herein as a delta file. Delta files are known in other areas of software development, however, traditionally files have been treated as a sequence of data records, typically lines of text. Therefore, differences of order or even ‘white space’ for example spaces, new lines, carriage return/line feed are considered significant. This can be particularly problematic where more complex data structures such as CAD systems are involved. Ideally, though, a method that identifies and records these differences should ignore apparent differences, for example the ordering of XML element attributes, which do not constitute semantic differences (see for example Canonical XML Version 1.0: http://www.w3.org/TR/xml-c14n, <http/www/w3/org/TR/xml-c14n>, and the XML Information Set.
One approach to the analysis and recordal of differences between XML documents is the XML TreeDiff Update Language XML TreeDiff computes the differences between two XML documents by analysing the documents as DOM trees and identifying the differences as a sequence of tree editing operations (DOM is the acronym for Document Object Model). XML TreeDiff represents the differences between two XML files in a structure quite separate from that of the original XML files.
The present invention seeks to provide an improved method of and software for recording changes to markup language files that employ application-defined tags and that additionally enables such changes to be validated. In the context of this document reference to a markup file being valid is reference to the contents of a markup file being well-formed and ideally complying with the constraints expressed in an associated set of element type declarations. Where a markup file has no associated set of element type declarations or where strict compliance with the document type definition of the markup file is not required, the markup file is deemed to have a document type definition where each element has a content model of ANY and the content of the markup file is validated by ensuring the content is well-formed.