A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.
The present invention generally relates to the field of object-oriented computer programs and in particular to a system and method for comparing XMI-based XML documents for identical contents.
Extensible Markup Language (XML) is a new format designed to bring structured information to the Web. It is a Web-based language for electronic data interchange. XML is an open technology standard of the World Wide Web Consortium (W3C), which is the standards group responsible for maintaining and advancing HTML and other Web-related standards.
XML is a sub-set of SGML, which maintains the architectural aspects of contextual separation while removing non-essential features. The XML document format embeds content within tags that express the structure. XML also provides the ability to express rules for the structure (i.e., grammar) of a document. These two features allow automatic separation of data and metadata, and allow generic tools to validate an XML document against its grammar.
XML Metadata Interchange (XMI) combines the extensibility of XML with the object modeling power of the Unified Modeling Language (UML) and the Meta Object Facility (MOF) to provide a model-driven framework for sharing complex information.
The purpose of XMI is to make it possible for different tools, applications and repositories on a variety of platforms and middleware to meaningfully share metadata and data. XMI is already being used to share modeling and programming metadata, and its use is expanding into other fields like data warehousing, component management, business objects, and various application domains.
Three complementary standards (XML, UML and MOF), which were brought together by XMI, may be summarized as follows:
a) Extensible Markup Language (XML):
XML is a language used as a foundation for creating specific types of documents. Each type of document has a Document Type Definition (DTD) that describes the structure and element types used in a document. The DTD is used to validate documents of that type.
Popular web browsers now include a built-in XML parser since XML is becoming a dominant way to pass information across the web. Note that XML is not limited to the webxe2x80x94it can be used wherever files or streams are supported.
XML has gained wide acceptance quickly. Over 40 books about XML were published in less than a year after XML became a standard. XML is already supported by tools from numerous vendors: Adobe, Arbor Text, DSTC, HP, IBM, Microsoft, Netscape, Oracle, Platinum, Select, Sun, Unisys and Xerox. Moreover, XML is used in several applications such as publishing, repositories, modeling, database design, data warehousing, services, financial data interchange, health care, and more. Each application has its own document type and corresponding DTD.
b) Unified Modeling Language (EML):
The UML is a notation for object-oriented analysis and design supported by graphical design tools. UML models are used in many domains to describe object-oriented systems. UML models are often used to generate programming language syntax or database schemata. XMI defines how UML models can further be used to generate XML document types.
c) Metaobject Facility (MOF):
The MOF is an extensible framework for models of metadata, providing model-based interfaces for storing and accessing metadata managed by repositories or other tools. The MOF maps core UML concepts, like Class and Association, to specific object interfaces.
The MOF specification defines two levels of object interfaces for creating and accessing the modeled information. First, the MOF defines a single reflective interface that can be used for all types of models. Second, the MOF defines a pattern for generating specific interfaces for individual models. The generation pattern is now standardized for CORBA IDL, a Java pattern is coming, and other object languages are expected to follow.
XMI Brings XML, UML and MOF Together:
XMI adds an XML stream-based interchange capability to the two levels of object interfaces. The XMI specification defines the pattern for turning a model into a DTD and for turning modeled data into XML.
The XMI Specification defines how a model in a MOF system is translated into an XML document type (DTD) and how modeled objects are translated to and from XML. UML is the starting point where object-oriented discipline and rigor are applied to defining a model. MOF rules then define the resulting interfaces to documents defined by the model. The XMI specification defines the XML document type.
In the prior art, it is a tedious and time consuming task to compare XMI-based XML documents for identical content. Prior art techniques use a method based upon comparison of textual content. These techniques are incapable of returning as identical two documents that are semantically identical but are arranged in a different order. Instead, when order is not significant, they return semantically identical documents as unequal along with semantically un-identical ones. Thus, the user has to manually sift through all documents returned as unequal to identify documents that are semantically identical but have a different order. This is a highly time consuming and tedious task when a large number of XMI-based XML documents need to be processed. Accordingly, there is a need for automatically comparing XMI based documents for identical content.
An associated problem with prior art techniques is that current methods fail to ignore differences in internal identifier values. Thus, semantically identical documents arranged in the same order but with different XMI internal identifier values are returned as unequal when using current methods of comparison.
Another problem with prior art techniques for comparing XML documents is that currently, comparison methods must be customized or changed for individual document types. This is because these comparison methods are highly context specific and need to be told what kind of comparisons to expect.
As will be amplified in greater detail hereinbelow, the present invention solves one of the prior art problems by creating a semantic graph of all documents before applying the comparison algorithm, which helps to standardize all XML documents to a common standard semantic graph based format that the comparison algorithm is capable of processing.
Accordingly, it is an object of the present invention to provide a system and method for comparing a semantic graph encoded in documents rather than comparing textual content as in current XML comparison methods.
Another object of the present invention is to provide a system and method for comparing documents that considers order only where order is significant.
Yet another object of the present invention is to provide a system and method for comparing documents that ignores differences in internal identifiers (e.g., xmi.id values).
Still another object of the present invention is to provide a universal system and method for comparing XML documents that works for any XMI-based document type.
These and other objects, which will become apparent as the invention is described in detail below, wherein a method is provided by a computer system processing XMI-based XML documents. The method compares two such XMI-based XML documents for identical content. The method begins with the step of parsing each of the documents to create for each a semantic graph of the document""s objects. Next, a list of names of properties for each of the objects having significant order is read. For each of the objects, and then for each object""s property not listed as having significant order, values of said properties are sorted. Finally, the objects of the semantic graphs are compared.
Still other objects, features and advantages of the present invention will become readily apparent to those skilled in the art from the following detailed description, wherein is shown and described only the preferred embodiment of the invention, simply by way of illustration of the best mode contemplated of carrying out the invention. As will be realized, the invention is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the invention. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive, and what is intended to be protected by Letters Patent is set forth in the appended claims. The present invention will become apparent when taken in conjunction with the following description and attached drawings, wherein like characters indicate like parts, and which drawings form a part of this application.