1. Field
The disclosure relates generally to an improved data processing system, and more specifically to providing semi-automatic evaluation and prioritization of architectural alternatives for data integration, a.k.a. information integration.
2. Description of the Related Art
Data within large organizations is often contained in heterogeneous databases that have different structures for representing the same or overlapping data. The heterogeneity can occur on various levels: technologies (e.g., hierarchical, network, relational, XML, etc.), data models (normalized, non-normalized, etc.), instance values, etc. Data structure heterogeneity becomes a challenge when data from the multiple sources needs to be integrated. Data integration or information integration enables end users to access connected data in multiple databases transparently, so that the end users are not limited by system boundaries.
There are three primary data integration architecture patterns that are used to address the heterogeneous database data integration challenge: data federation, data consolidation, and application-based integration. The data federation architecture pattern provides a virtual data integration approach in that the data is aggregated “on the fly” and only when requested by the consumer. The data consolidation architecture pattern extracts the data from its sources, transforms (integrates) the data, and then loads (copies) the data into a target database before a consumer requests the data. The application-based integration architecture pattern uses application programming interfaces (APIs) to integrate data from various systems and facilitate process choreography. Application-based integration is often implemented using a variety of middleware technologies and programming models. Enterprise Service Bus (ESB), the backbone of Service Oriented Architecture (SOA), is a type of application-based integration architecture. Although the objective of these three data integration architecture patterns is the same—they all integrate data—the characteristics of each pattern are different.
When solution architects want to design a solution that integrates data from multiple heterogeneous sources, they must decide which data integration architecture pattern to apply. The pattern selection is a complex process, as the solution architects must consider numerous data integration and design factors to determine the most appropriate pattern for a given scenario. Failure to select an appropriate data integration architecture pattern can result in delayed project deliverables, significantly increased costs, and even failure of the overall project.