1. Field of the Invention
The present invention relates generally to comparing two files each file being capable of being parsed into DOM (Document Object Model) trees.
2. Description of the Related Art
Traditional file compare programs compare two files without considering the meaning of the contents of the two files. That is, they basically look at the two files as raw data, compare them byte by byte, and report the differences. When the files relate for example, to object-oriented computer programs such as XML (Extensible Markup Language), the output of such a traditional file compare is huge. Moreover this output is essentially unreadable and, thus, essentially useless.
XML is a relatively new document format designed to bring structure information to the Web. The XML document format embeds content within tags that express the structure of the document. XML provides the user with the ability to express rules, known collectively as an XML Schema, for the structure of a document. An XML document mainly contains elements and attributes that are used to describe data. Because XML was designed to describe data, format items such as white spaces, tabs, new line characters and even the actual order in which elements and attributes appear in the XML file are not of significant interest to the users. In comparing XML files, a traditional file compare application does not consider the syntax and semantics of these files. Consequently, format differences and differences in the order of appearance of elements and attributes will be reported. The reporting of these types of “differences” will typically be significant in number and have no value in a worthwhile comparison of the documents. Moreover they make any meaningful differences reported in the voluminous data difficult for a user to recognize, i.e., the meaningful differences are “buried” among the irrelevant differences.
Accordingly, it would be desirable to have a method for comparing two documents syntactically and semantically to report meaningful differences between them even when irrelevant differences, e.g., the order of the elements contained therein, are also existing between the two documents.