Under certain conditions, it is desirable to store copies of a particular set of data, such as a relational table, at multiple sites. If users are allowed to update the set of data at one site, the updates must be propagated to the copies at the other sites in order for the copies to remain consistent. The process of propagating the changes is generally referred to as replication.
Various mechanisms have been developed for performing replication. Once such mechanism is described in U.S. patent application Ser. No. 08/126,586 entitled "Method and Apparatus for Data Replication", filed on Sep. 24, 1993 by Sandeep Jain and Dean Daniels, the contents of which are incorporated by reference.
The site at which a change is initially made to a set of replicated data is referred to herein as the source site. The sites to which the change must be propagated are referred to herein as destination sites. If a user is allowed to make changes to copies of a particular table that are at different sites, those sites are source sites with respect to the changes initially made to their copy of the table, and destination sites with respect to the changes initially made to copies of the table at other sites.
Replication does not require an entire transaction that is executed at a source site to be re-executed at each of the destination sites. Only the changes made by the transaction to replicated data need to be propagated. Thus, other types of operations, such as read and sort operations, that may have been executed in the original transaction do not have to be re-executed at the destination sites.
Row-level replication and column-level replication constitute two distinct styles of replication. In row-level or column-level replication, the updates performed by an executing transaction are recorded in a deferred transaction queue. The information recorded in the deferred transaction queue includes both the old and the new values for each data item that was updated. Row-level and column-level replication differ with respect to whether old and new values are transmitted for an entire relational row (row-level) or for only a subset of its columns (column-level).
The changes recorded in the deferred transaction queue are propagated to the destination site. The destination site first checks that its current data values agree with the transmitted "old" values. The check may fail, for example, if concurrent changes have been made to the same replicated data at different sites. If the check fails, a conflict is said to have been detected. Various techniques may be used to resolve such conflicts. If no conflict is detected, the current data values at the destination site are replaced with the transmitted "new" values.
Referring to FIG. 1, it illustrates a system in which copies of a table 118 are stored at multiple sites. Specifically, the system includes three sites 100, 102 and 104. Sites 100, 102 and 104 include disks 106, 108 and 110 that store copies 120, 122 and 124 of table 118, respectively. Database servers 130, 132 and 134 are executing at sites 100, 102 and 104, respectively.
Assume that database server 130 executes a transaction that makes changes to copy 120. When execution of the transaction is successfully completed at site 100, a record of the changes made by the transaction is stored in a deferred transaction queue 160 of a replication mechanism 140. Such records are referred to herein as deferred transaction records. Typically, the deferred transaction queue 160 will be stored on a non-volatile storage device so that the information contained therein can be recovered after a failure.
Replication mechanism 140 includes a dequeue process for each of sites 102 and 104. Dequeue process 150 periodically dequeues all deferred transaction records that (1) involve changes that must be propagated to site 102, and (2) that dequeue process 150 has not previously dequeued. The records dequeued by dequeue process 150 are transmitted in a stream to site 102. The database server 132 at site 102 makes the changes to copy 122 of table 118 after checking to verify that the current values in copy 122 match the "old values" contained in the deferred transaction records.
Similarly, dequeue process 152 periodically dequeues all deferred transaction records that (1) involve changes that must be propagated to site 104, and (2) that dequeue process 152 has not previously dequeued. The records dequeued by dequeue process 152 are transmitted in a stream to site 104. The database server 134 at site 104 makes the changes to copy 124 of table 118 after checking to verify that the current values in copy 124 match the "old values" contained in the deferred transaction records.
Various obstacles may impede the efficiency of the replication mechanism 140 illustrated in FIG. 1. For example, a mechanism must be provided which allows dequeue processes 150 and 152 to distinguish between the deferred transaction records within deferred transaction queue 160 that they have already dequeued, and the deferred transaction records that they have not yet dequeued.
Further, a single stream connects dequeue processes 150 and 152 to their corresponding destination sites. Efficiency may be improved by establishing multiple streams between the source site and each of the destination sites. However, there are constraints on the order in which updates must be applied at the destination sites, and the replication mechanism has no control over the order in which commands that are sent over one stream are applied at a destination site relative to commands that are sent over a different stream. Therefore, a transmission scheduling mechanism must be provided if commands are to be sent to a destination site over more than one stream.
Currently, database systems implement replication by executing deferred transactions using two phase commit techniques. During two phase commit operations, numerous messages are sent between the source site and each of the destination sites for each transaction to ensure that changes at all sites are made permanent as an atomic event. While the use of two phase commit techniques ensures that the various databases may be accurately recovered after a failure, the overhead involved in the numerous inter-site messages is significant. Therefore, it is desirable to provide a mechanism that involves less messaging overhead than two phase commit techniques but which still allows accurate recovery after a failure.