Data profiling can provide extremely useful information about the underlying data. For example, data profiling can be used to: discover the quality, characteristics, and potential problems of data; catalog and analyze metadata and discover metadata relationships; and help understand and prepare data for integration. The profiling process typically includes collecting information, such as statistics (e.g., mean and median values) and values (e.g., maximum and minimum values), of the underlying data. In the past, data profiling was performed manually. That is, to profile data, data analysts typically “eye balled” the data and collected statistics and other information about the data as they reviewed it. Manually analyzing data is an extremely time-consuming and error-prone process. Thus, given the amount of data used by businesses today, the manual method of profiling quickly became impractical. In response to the need for more efficient means of data profiling, various vendors began to develop and commercialize data profiling applications and tools. For example, IBM® and Computer Associates® sell a variety of tools, such as software applications, that can be implemented to profile data. However, the tools and methods currently available do not provide a well-refined solution for profiling data for integration.
Merging businesses, streamlining departments, parent companies operating subsidiary entities, and the like have given rise to a need to integrate data from a variety of sources. For example, over the years, United Parcel Service of America, Inc. (UPS®), has acquired various entities. When UPS® acquires entities such as the Overnite Corporation, it has a compelling interest in making use of the data of the acquired companies. What makes data integration challenging is the fact that while two data sources may comprise similar data, the data sources likely reference, store, and interact with the data differently. Thus, integrating data from unique data sources has posed a challenge to data integration efforts. Therefore, a need exists for systems and methods that aid in efficiently and effectively gathering, comparing, documenting, creating, and storing information about the data being profiled for potential integration opportunities, and for actually integrating the data.