The invention relates generally to computer systems and services, and more particularly to a tool to estimate the cost of integrating heterogeneous data sources together so the data in these heterogeneous information sources can be exchanged between each other and accessed by users.
Organizations today often need to access multiple, heterogeneous data sources. Existing middleware technology and the World Wide Web enable physical connectivity between dispersed and heterogeneous data sources. “Heterogeneous” data sources are data repositories and data management systems that are incompatible. The incompatibility, also called “semantic conflict”, includes differences in structural representations of data, differences in data models, mismatched domains, and different naming and formatting schemes employed by each data source. Thus, heterogeneous data sources store data in different forms, and require different formats and/or protocols to access their data and exchange data between themselves.
The following are known examples of heterogeneous data sources and their incompatibility or semantic conflicts.
Example 1 is a partial schema of an Oracle™ database, and example 2 is a partial data base schema of a Microsoft™ application for a SQL Server based employee database. (The term “data base schema” refers to fields, attributes, tables and other categorizations of data in a database.)