Businesses and consumers rely on their communications infrastructure for accessing and sharing information. The information or data can be organized and kept in records for ease of access by multiple users or applications. When the collection of information is organized in electronically accessible records, it is managed and updated by computers. These electronically accessible records are commonly referred to as operational databases. The databases are accessed and modified by an increasing number of applications. Each application may make modification based on its own need. When businesses merge or simplify their business processes, the new business model often requires integrating the various applications. The databases are then combined into a common database. The corresponding columns in the common database may contain various types of values. The modifications by various applications and the integration of applications may cause the databases to contain a great deal of heterogeneity. The heterogeneity creates a data quality issue that may prevent applications from being able to access and utilize the data. Thus, knowing the quality of the data in the database may improve the performance of various applications that depend on the accuracy of the data in the database.
Therefore, there is a need for a method that enables rapid identification of column heterogeneity in databases.