The present invention relates generally to a method and system of automatically reviewing information mappings across different information models. More particularly, the present invention relates to a tool that reviews the quality of mappings by identifying erroneous mappings between information models.
An information model is a way of representing and managing information, such as data, relationships, services, and processes, in data processing systems for a particular domain or enterprise. Every day, organizations deal with a myriad of different semantic expressions in key information, and expend huge resources working around the inconsistencies, challenges and errors introduced by so many varying information models. Examples of information models are Entity-Relationship (ER) models, Unified Modeling Language (UML) models, Eclipse Modeling Framework (EMF) models, thesauri, ontologies or Extensible Markup Language (XML) schema.
These varying models rarely share a common terminology, because they have emerged as a result of several inputs. In some cases, mergers of organizations operating in the same industry result in different information models to express the same exact concepts. In other cases, they may have been developed by different individuals to express overlapping industry concepts, but in slightly different domains.
Irrespective of the means through which these models came about, today's organizations utilize many different information models and face an increasing need to integrate across these models, through data integration, shared processes and rules, or reusable services. In all of these cases, the ability to relate, or map, between elements of different information models is a critical foundation stone in addressing these challenges.
A mapping between information models involves the matching of elements of the models, which can be based on, for example, lexical names, semantics, and/or other attributes. Both a user attempt to manually map and a computer-automated attempt to map different information models are error prone.
In user attempts, one source of the error comes from the size of these models (typically, these models have several thousand elements each) and the fact that the lexical name of the elements rarely match, or when they do match, it is because of the wrong reasons (e.g., a document may have an “endDate” attribute, as does a claim, but the two “endDate” attributes reflect semantically different things, although they match at the lexical level). A second source of error in user attempts is that the models often express different levels of normalization. For example, in one environment a concept may be expressed at a very specific level of sub-typing, such as “Mortgage Credit Specialist”. In another environment, that same concept may be expressed at a much higher level, such as “Financial Services Role”. This introduces a complexity into the mapping where the concepts being mapped are at very different levels of specification, which can be very difficult to maintain across multiple systems. A frequent user's response to this difference in normalization is a tendency to map everything to these generic structures. Taking an extreme example, if a target model contains “thing”, it is very tempting for an analyst to interpret everything in the source as an instance of a “thing” and perform all mappings at this level. While the mapping is technically not invalid, mappings at this level are not useful to downstream initiatives, and mappings like this significantly affect the quality of the mapping results.
In computer-automated attempts, the mapping process is also error prone. As an example, model-matching algorithms may consider the descriptions for a given element in their matching process to aid matches across items that match semantically but do not match lexically. Frequently, however, the descriptions of these elements are duplicated or copied across multiple elements. Such an algorithm will likely produce a number of false positives because of the duplication of documentation.