1. Field of the Invention
The present invention relates to data comparisons, and more particularly to a system and method for comparing database data.
2. Description of the Related Art
In computer system environments where data is replicated, database administrators (DBA's) typically compare data using a variety of scripts, in order to test the success or failure of the data replication. This data comparison may compare sent data (i.e., data before any database operation occurs) to received data (i.e., data after database operations are completed). This data comparison may range from comparing a sub-set or sample of the data content before the data replication, or a generated value based on the sub-set or sample of the data content before the data replication with a corresponding sub-set or sample of the data content after the data replication, or a generated value based on the sub-set or sample of the data content after the data replication, to an exhaustive comparison of all of the sent or “before” data with all of the received or “after” data.
Examples of generated values include: a count of the number of rows, or a computation of average row length. When the generated value used to determine matching data is the count of the number of rows, the data comparison may be deemed successful, even in a case where the content of the rows is different, but the count of the number of the rows matches. The content of the rows may be different and the count of the number of the rows may match when one or more rows is deleted from the first set of data being compared and the same number of rows are inserted into the second set of data being compared. Similarly, a row length, or byte size, may match when the content is different. Thus using either a count of the number of rows or a computation of average row length as a basis for determining matching sets of data has a high probability of yielding incorrect comparisons.
An exhaustive comparison of data may be accomplished by sorting all the rows and comparing each piece of data, row by row. This method typically consumes large amounts of disk space and time to complete, especially for very large databases, and thus is a very slow, although typically very accurate method of comparing data.
The scripts that DBAs use to test the success or failure of a data replication are typically custom-made and typically require modifications, from time to time. The process of creating and maintaining data replication test scripts may be quite tedious, prone to error, and time-intensive.
Likewise, database operations that require data in a table to be unloaded from the database and subsequently reloaded into the database, (e.g., database reorganization, or a change to a column requiring a table to be rebuilt) are prone to error due to the complexities involved.
DBAs typically create and maintain custom-made scripts to test the success or failure of database operations that require data in a table to be unloaded from the database and subsequently reloaded into the database. Similar to the data replication test scripts noted above, the test scripts used to check the success or failure of load/unload database operations typically require modifications, from time to time. The process of creating and maintaining database load/unload test scripts may be quite tedious, prone to error, and time-intensive.
It is desirable to improve the process of comparing data to increase the likelihood that data replication and database operations complete successfully such that the time investment for DBAs to complete this task is diminished.