One challenge of enterprise systems, e.g., an enterprise resource planning (ERP) system, includes avoiding data redundancy. For example, well-designed enterprise systems follow strict data modeling rules in an effort to achieve a redundancy-free data model. Although such modeling rules are available and often strictly followed, data redundancy still occurs. For example, business processes often rely on similar data. Consequently, the more comprehensive an enterprise system is, the more data redundancy it will contain. Data redundancy can become a bigger issue when distributed business systems have to be integrated into a single enterprise system. This results from the variety of data structures and models, as well as the variety of business processes.
Detection of data redundancy can be achieved by comparing data structures. However, this approach has disadvantages. For example, finding similar data structures does not definitively indicate redundancy. This is, for example, the case when reusable structures such as address are used. In an example business context, the address structure (first name, surname, city, etc.) is used in both business objects: customer and supplier. In case customers and suppliers are disjoint, there will be no redundancy in address data. As another example, with data structure comparison, it is not possible to determine the severity of data redundancy. For example, severity of data redundancy can be calculated based on how much overlap exists in the database content. As another example, the same data can be sorted within different attribute-names or labels.