In order to assist computers in transferring information between one another in a reliable manner, it has recently been proposed that a computer language known as eXtensible Mark-up Language (XML) should be used to generate text files structured in a well-defined manner which enables information contained in the fifes to be reliably extracted by a receiving computer. As is well known in the art, XML is fully declarative, by which it is meant that the significance of many so-called “tags” used in XML files may be user defined. For a discussion of XML see any one of numerous published books on the subject such as “XML for Dummies” by Maria H. Aviram published by IDG Books Worldwide Inc., or see the Internet web-site of the World Wide Web Consortium for information about XML.
Because XML is declarative, XML may be considered as being a “meta-language” which can be used to define individual markup languages which can then be used to generate well structured documents. In order to determine if a particular document is well-structured (i.e. that it complies with the rules of a particular mark-up language) it may be compared with (or “validated” by) either an XML-schema or a Document Type Definition (DTD) file.
Generally speaking, in order to generate XML documents for easy transfer of information between computers, a user generates a DTD or XML-schema file first and then writes subsequent XML files which conform to the “rules” specified in the XML-schema or DTD file. However, in some circumstances, it may be more convenient to write one or more example XML files first and then to generate automatically a suitable XML-schema or DTD file which is appropriate for the or each example XML file.
A number of applications have been developed which provide this functionality. For example, Microsoft Corp. has written a utility which permits an XML-schema to be inferred from an example XML file and also for an XML-schema to be modified to account for a single additional example XML-schema. The utility is referred to as the XSD Inference Utility. Note that in order to use the utility a user would have to write and compile his own specialised code (using the same programming language as that in which the utility has been written). Furthermore, the methodology adopted in this utility results in the utility having a number of drawbacks. In particular, the utility tends to produce unnecessarily long and complicated XML-schema. Additionally, only a single XML file may be processed at any one time by the unmodified utility.
Published International Patent Application No WO 2005/083591 describes a method of generating concise validating documents (DTD's or XML schemes) for any number of input XML files which need not (and indeed most advantageously do not) describe the same type of thing (i.e. the XML files may be heterogeneous, although possibly overlapping to some extent). This document also describes the use of such validating files for assisting in permitting separate parties to communicate information with one another by gathering together a set of example XML files comprising example files from all of the parties wishing to communicate data between one another (e.g. as part of a collaboration (e.g. for a project) of some sort) and then generating a validator file to cover the entire set of example XML files. The manner in which the validator file is produced involves serially forming a Document Object Model (DOM) tree of each of the example files and then traversing each such tree to build up an intermediate representation which combines the features of each example file, and then converting the intermediate representation into a validator file.