The term data integration refers to the problem of combining data residing in heterogeneous sources and currently, it relates to wide range of technologies, from extract, transform and load (ETL) to enterprise application integration (EAI) to enterprise information integration (EII) and various change propagation technologies. There has been extensive theoretical research on data integration systems, exploring various mapping systems and languages, and their complex results and limitations. However no single technology suffices for all needs of an enterprise and these needs keep changing with growing data volumes and changing business needs. Consequently, enterprises end up tweaking their integration systems continually and sometimes summarily moving them from one technology to another, to keep up with these demands. This consumes a lot of effort—by some estimates as much as 40% of all IT efforts in an enterprise.
One reason why this consumes so much effort is the rigidity of the available integration technologies. Once a solution is implemented in one of these technologies, moving to another is like implementing the entire solution afresh which requires large amount of time, efforts and computational resources. As a result, people end up building ad-hoc, quick-fix solutions, which over time leads to data fragmentation and inconsistencies. Keeping these fragments synchronized to avoid inconsistencies puts a lot of strain on these systems.
Lack of common reference architecture and lack of a common set of foundational primitives from which purpose specific solutions can be composed are the principal reasons for this state of affairs.
Though there exist a large number of vendors with tool offerings in ETL, EAI, data migration, EII and so on, each uses one's own proprietary technology with no interoperability, sometimes even among tools of the same category (E.g. Vendor X's ETL tool to vendor Y's ETL tool). Some vendors offer tools in many categories (for example both ETL and EII), but again with no interoperability between a tool of one category and a tool of another. The principal reason for lack of interoperability among tools of the same category (say ETL) is the lack of common reference architecture across tool implementations. The principal reason for lack of interoperability across categories (say ETL and EII) is that their specifications are too close to the implementation platform i.e. they are not at a level of abstraction that allows their semantics to be mapped easily.
Moreover, the data inter-operability problem arises from the fact that data, even within a single domain of application, is available at many different sites, in many different schemas, and even in different data models. The integration and transformation of such data has become increasingly important for many modern applications that need to support their users with informed decision making.
While number of useful approaches has been devised for designing and deploying specific integration processes, there remains a need for tools to enable easy migration of the integration processes themselves, once designed, among different technology platforms.