The present invention has emerged from the field of synchronizing databases.
The present invention relates generally to computing systems in which there are kept a number of replicated databases, and in particular to a method for comparing the databases quickly and efficiently.
In a clustered computing environment, as well as other environments, it is required to provide each node with information concerning the cluster (e.g., the location of processor units, peripheral units, etc.), its use, its users, and the like. Often kept in a database of one sort or another, the amount of this information can be quite large. This leads to problems when the databases of each node need to be checked, such as when a periodic check needs to be made to ensure the integrity of the database and the information it contains, or to ensure that changes to the database were made correctly. Such checks, however, can be very time consuming, and tend to impose a significant burden on system resources, particularly if such checks are frequently required. If the checks require communication between two nodes across a communication path, the amount of communication can be significant and create a bottleneck.
Under certain conditions, it is desirable to store copies of a particular body of data, such as a relational database table, at multiple sites in a distributed compute network. If users are allowed to update the body of data at one site, the updates must be propagated to the copies at the other sites in order for the copies to remain consistent. The process of propagating the changes is generally referred to as replication. Various mechanisms have been developed for performing replication.
The table at which a change is initially made to a set of replicated data is referred to herein as the master table. A table to which the change must be propagated is referred to herein as a satellite table. Replication does not require an entire transaction executed at a master table to be re-executed at each of the satellite tables. Only the net changes made by the transaction need to be propagated. Other types of operations, such as read and sort operations, that may have been executed in the original transaction do not have to be re-executed at the satellite tables.
There are two basic approaches to replication: synchronous replication and asynchronous replication. In synchronous replication, each update or modification of a body of data is immediately replicated to all other replicas or copies of the body of data within the distributed network, typically by techniques such as a two-phase commit. The transaction that modifies the body of data is not allowed to complete until all other replicas have been similarly updated. Although synchronous replication provides a straightforward methodology for maintaining data consistency in a network this method is susceptible to network latencies and intermittent network failures. Furthermore, synchronous replication cannot prioritize updates; accordingly, low priority updates can unnecessarily produce significant system delays.
On the other hand, in asynchronous replication, local replicas of a particular data structure are allowed to be slightly different for a time until an asynchronous update is performed. During asynchronous replication, a master table can be modified without forcing a network access as in synchronous replication methodology. At some later point time in time, the modification is propagated to the satellite tables. Various techniques for asynchronous propagation have been developed for example, remote procedure calls (RPCs) and deferred transaction queues.
In asynchronous replication, conflicts in updating a body of data might occur if two sites concurrently modify the same data item before the data modification can be propagated to other sites. If update conflicts are not first detected and then handled in some convergent manner, the data integrity of the replicated copies will begin to diverge.
Database systems often locally replicate remote tables that are frequently queried by local users. By having local copies of heavily accessed data on several nodes, the database does not need to send information across the network every time a transaction on any of the several nodes requires access to the data. Thus, the use of local copies of data improves the performance of the requesting node and reduces the amount of inter-node traffic.
The copies of data stored at replicated sites may diverge from the data at the original or xe2x80x9cbasexe2x80x9d site for any number of reasons. For example, software problems or conflict resolution issues may cause a database to replicate data incorrectly. To determine whether discrepancies exist between different copies of the same data, it would be beneficial to have a mechanism for comparing the replicated data to the corresponding data in the base site. Once the discrepancies are identified, they can be rectified.
The prior art have solved many of the problems apparent from the above discussion. However, still present has been the problem that replication of database information is performance intensive and time consuming, absorbing computing and communication resources from other computing and networking tasks.
In view of the foregoing, it would be highly desirable to make available a method of replicating data that is both quick and relatively undemanding of computing, and especially networking, resources.
Disclosed is a replication method. The method includes generating an identifier column for a master table, the master table including a key column and an identifier column. Also included is copying the master table to a satellite table, so that the satellite table is a replica of the master table. Further included is associating an insert trigger with the master table. Another inclusion relates to assigning a first identifier value to the identifier column of an inserted row, the assigning caused by the insert trigger and occuring responsively to inserting a row into the master table. Additionally included is allowing inserts to be made to the master table. Included is synchronizing the satellite table to the master table. Synchronizing includes comparing the master table key and identifier columns with the satellite table key and identifier columns. Synchronizing also includes producing a row set of rows based on the initial comparing of synchronizing, the rows being those rows present in the master table but not in the satellite table. Another synchronizing inclusion is comparing the master table key and identifier columns with the satellite table key and identifier columns. Additionally, synchronizing includes deleting the rows that are present in the satellite table but not in the master table, as determined based on the second synchronizing comparing. Synchronizing further includes inserting the row set of rows into the satellite table.
Disclosed is the replication method, further including associating an update trigger with the master table, assigning a second identifier value to the identifier column of the updated row, the assigning caused by the update trigger and occuring responsively to updating a row of the master table, and allowing updates to be made to the master table.
Further disclosed is the possibility of the identifier column including a checksum column or a row version number column. The checksum would be calculated based contents of the row. The row version number would simply be incremented. Likewise disclosed is the possibility of the identifier column including a row version number column. The identifier value is a row version number. The row version number is updated by incrementation.
Also disclosed is a configuration wherein the master table is in a first database, the first database residing on a first computing device and wherein the satellite table is in a second database, the second database residing on a second computing device. The first computing device is communicably coupled to the second computing device. Comparing may be a distributed query.
Moreover, if the identifier value includes a row version number, then determining an activity level of a row based on the row version number associated with that row is possible.
Another disclosure includes designating a pass row version number, such that the pass row version number will not be assigned to any row by incrementing of the row version number during normal operation, assigning the pass row version number to a pass row, and wherein the pass row and the satellite table counterpart to the pass row will not be affected by the synchronizing. A similar disclosure includes designating a postpone row version number, such that the postpone row version number will not be assigned to any row by incrementing of the row version number during normal operation, designating a designated time period, assigning a postpone row version number to a postpone row"" and wherein the postpone row and the satellite table counterpart to the postpone row will only be affected by the synchronizing during the designated time period.
Disclosed is a replication method. Generating an identifier column for a master table, the master table including a key column and an identifier column. Copying the master table to a satellite table, so that the satellite table is a replica of the master table. Providing an insert stored procedure, wherein effects of executing the insert stored procedure include inserting a row and assigning a first identifier value. Inserting a row provided as a parameter to the insert stored procedure. Assigning a first identifier value to the identifier column of the inserted row. Allowing inserts to be made to the master table by executions of the insert stored procedure. Synchronizing the satellite table to the master table, the syncrhonizing including several acts. Comparing the master table key and identifier columns with the satellite table key and identifier columns. Producing a row set of rows based on the comparing, the rows being those rows present in the master table but not in the satellite table. Comparing the master table key and identifier columns with the satellite table key and identifier columns. Deleting the rows that are present in the satellite table but not in the master table, as determined based on the comparing. Inserting the row set of rows into the satellite table.
Further disclosed is a replication method. Generating an identifier column for each table of a tree of master tables, each of the master tables including a key column and an identifier column, wherein one of the master tables is a root master table, wherein all of the other master tables are foreign key master tables that are foreign key dependent upon at least one other of the master tables, wherein all of the master tables can be reached by traversal of the tree of master tables. Copying the tree of master tables to a tree of satellite tables, so that the tree of satellite tables is a replica of the tree of master tables. Associating an insert trigger with each of the master tables. Assigning a first identifier value to the identifier column of an inserted row, the assigning caused by one of the insert triggers and occuring responsively to inserting a row into one of the master tables. Allowing inserts to be made to at least one of the master tables. Synchronizing the tree of satellite tables to the tree of master tables, the syncrhonizing including several acts. Comparing the key and identifier columns of each of the master tables with key and identifier columns of each of the satellite tables. Producing row sets of rows, each row set of rows based on one of the comparing, the rows of each row set being those rows present in the master table but not in the satellite table. Comparing the key and identifier columns of each of the master tables with the key and identifier columns of each of the satellite tables. Deleting the rows that are present in the satellite tables but not in the corresponding master tables, as determined based on the comparing, wherein the deleting is ordered from the lowest to highest seniority of satellite tables. Inserting the row sets of rows into the satellite tables, wherein the inserting is ordered from the highest to lowest seniority of satellite tables.
Likewise disclosed is a replication method. Generating an identifier column for each table of a tree of master tables, each of the master tables including a key column and an identifier column, wherein one of the master tables is a root master table, wherein all of the other master tables are foreign key master tables that are foreign key dependent upon at least one other of the master tables, wherein all of the master tables can be reached by traversal of the tree of master tables. Copying the tree of master tables to a tree of satellite tables, so that the tree of satellite tables is a replica of the tree of master tables. Providing an insert stored procedure, wherein effects of executing the insert stored procedure include inserting and assigning: inserting a row provided as a parameter to the insert stored procedure, and assigning a first identifier value to the identifier column of the inserted row. Allowing inserts to be made to at least one of the master tables by executions of the insert stored procedure. Synchronizing the tree of satellite tables to the tree of master tables, the syncrhonizing including several acts. Comparing the key and identifier columns of each of the master tables with key and identifier columns of each of the satellite tables. Producing row sets of rows, each row set of rows based on one of the comparing, the rows of each row set being those rows present in the master table but not in the satellite table. Comparing the key and identifier columns of each of the master tables with the key and identifier columns of each of the satellite tables. Deleting the rows that are present in the satellite tables but not in the corresponding master tables, as determined based on the comparing, wherein the deleting is ordered from the lowest to highest seniority of satellite tables. Inserting the row sets of rows into the satellite tables, wherein the inserting is ordered from the highest to lowest seniority of satellite tables.
Also disclosed is a replication apparatus including a first computing device, a second computing device, a master table stored on the first computing device, a satellite table, an insert trigger, an update trigger, and a row set. The satellite table is stored on the second computing device, the satellite table being a replica of the master table. The insert trigger is associated with the master table, the insert trigger being adapted to calculate a new checksum for each row inserted into the master table and also adapted to assign the new checksum to the inserted row. The update trigger is associated with the master table, the update trigger adapted to recalculate the checksum of each row of the master table as the row is updated and also adapted to assign the recalculated checksum to the updated row. The row set includes rows that are in the master table but not in the satellite table. The rows of the satellite table that are not contained in the master table are deleted from the satellite table. The rows contained in the row set are inserted into the satellite table.
Also disclosed is a replication apparatus including a first computing device, a second computing device, a master table stored on the first computing device, a satellite table, an insert stored procedure, an update stored procedure, and a row set. The satellite table is stored on the second computing device, the satellite table being a replica of the master table. The insert stored procedure is adapted to insert a row into the master table, calculate a new checksum for the inserted row, and assign the new checksum to the inserted row. The update stored procedure is adapted to update a row of the master table, recalculate the checksum of the updated row, and assign the recalculated checksum to the updated row. The row set includes rows that are in the master table but not in the satellite table. The rows of the satellite table that are not contained in the master table are deleted from the satellite table. The rows contained in the row set are inserted into the satellite table.
Additionally disclosed is the alternative of a replication apparatus, as above, but for a tree of master tables stored on the first computing device and a tree of satellite tables stored on the second computing device, the tree of satellite tables being a replica of the tree of master tables. Each of the master tables includes a key column and an identifier column. One of the master tables is a root master table. All of the other master tables are foreign key master tables that are foreign key dependent upon at least one other of the master tables. And all of the master tables can be reached by traversal of the tree of master tables. Each of the satellite tables includes a key column and an identifier column. One of the satellite tables is a root satellite table. All of the other satellite tables are foreign key satellite tables that are foreign key dependent upon at least one other of the satellite tables. And all of the satellite tables can be reached by traversal of the tree of satellite tables.
The present invention achieves many advantages which solve problems left unsolved by the prior art. One of these advantages is that the master table need only be copied in its entirety one time, thereafter only requiring those rows which have changed to be copied. Use of a checksum is advantageous because it is a highly reliable indicator of whether a change has occurred and does not carry any additional information. Use of a row version number is advantageous because it is also highly reliable and has the added advantage that it can carry information about how many times a row has been changed. Application of these to a paradigm including single tables and tables related via foreign key constraints broadens the advantages.