1. Field of the Invention
The present invention relates to database table management and more particularly to merging different database tables.
2. Description of the Related Art
A database consists of an organized collection of data for one or more uses, typically in digital form. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among different points in a network. An object-oriented programming database is one that is congruent with the data defined in object classes and subclasses. A flat file database describes any of various means to encode a database model (most commonly a table) as a single file. Finally, a simple table can be included as part of a simple document such as a word processing document or a spreadsheet.
The overall organization of a database is typically referred to as a schema for the database, such as a hierarchical or relational schema. A database schema is often compactly expressed using table names and names of columns in tables. Accordingly, a database may contain multiple tables, each containing different columns. The tables can be populated with data from different data sources. Oftentimes, to efficiently analyze data collected in different tables, it is desirable to merge the different tables into a single table. However, one difficulty when merging independently created database tables consists in identifying correlated columns in the database tables that are mergeable to create a meaningful merged result. To establish a proper correlation, though, one must know the exact meaning of each column in the different tables. For a simple table, manually identifying correlated columns in different tables is of no consequence. For larger tables of many columns, the task can be daunting.
U.S. Pat. No. 7,725,498 to Eric L. Barsness et al. for TECHNIQUES FOR IDENTIFYING MERGEABLE DATA (hereinafter, “Barsness”) addresses the process of establishing a correlation between columns in two different tables selected for merger. In this regard, as described in the Abstract of Barsness, correlation attributes are determined for a first column and a second column from one or more database tables. The correlation attributes describe for each column either or both the column and the content of the column. The correlation attributes from the first and second column then are compared and similarities between the first and second column are identified on the basis of the comparison. Then, on the basis of the identified similarities, it is determined whether the first and second columns are correlated. Only if the columns are determined to be correlated, the first and second columns are merged.
The process of merging related columns however, involves simply aggregating the rows of each related column into a single column whose total rows are the sum of the rows of the related columns. Consequently, the general algorithm of merging columns fails to account for the reality that some rows in the merged column may contain duplicate data to the extent that the source rows in the correlated columns may contain the same data. Accordingly, the merged table with merged columns may contain undesired inaccuracies.