1. Field of the Invention
The present invention relates generally to the synchronization of shared data structures, e.g., databases, and, particularly a system and method for replicating a shared data structure across several computers to improve the availability and speed at which programs can access and operate on this shared data structure. More particularly, the invention relates to optimization among multiple synchronization algorithms.
2. Description of the Prior Art
In business environments, it is typical to require that operations on a data structure be grouped and executed as a transaction. Applications with this requirement are termed transactional applications. Transactional applications must satisfy the so-called ACID (Atomic, Consistent, Isolated, Durable) properties as described in J. Gray et al., “Transaction Processing: Concepts and Techniques,” Morgan Kaufmann, 1993, ISBN 1558601902. Thus, a transaction may employ semantics that serve to satisfy the ACID properties such that the transaction is considered to be atomic (i.e., all or nothing), consistent (i.e., the data is never seen to be in an inconsistent state; e.g., An employee as a member of a nonexistent department), isolated (i.e., does not affect and is not affected by other transactions) and durable (i.e., will complete if the system fails or can be reversed). Without transactional semantics, concurrent clients may “step on” one another's data modifications with respect to an application. Examples of transactional applications include order entry, inventory, customer information, and human resources applications. When resident on a server, such applications allow multiple client computers to simultaneously access and operate on the shared information in a consistent manner. Examples of such clients are those supporting the Java 2 Platform, Micro Edition (J2ME) for consumer and embedded devices such as mobile phones, PDAs, TV set-top boxes, in-vehicle telematics systems, laptop computers, and workstations.
Requiring the transactional application's data structures to be solely resident on a single server simplifies the task of providing transactional semantics. However, it has the disadvantage of not performing well when the request rate from the client computers is high. It also does not enable the client computers to access the applications when they are disconnected from the server computer. These disadvantages can be overcome by replicating the data structures so that they are resident on the client computers as well as on the server. Then, clients can execute the transactional application locally rather than accessing the server. Such a scheme requires a synchronization infrastructure that propagates updates between the replicas such that all replicas converge to a common consistent state.
Transactions which satisfy the ACID properties are also called serializable (see Gray et al. supra), because the result of the execution of a sequence of transactions must correspond to a serial (non-overlapping) sequence of execution of the transactions against a single copy of the data structure. Thus, it is convenient to think in terms of a single server having the “master” replica of the database, and the clients having replicas of the database. The “authoritative” replica of the database is the server database, and client replicas must, after synchronization, correspond to the current (or near current) state of the server replica. Those skilled in the art recognize that this approach may be enhanced by partitioning the master replica across multiple servers (e.g. Server 1 has the master replica of employees A-M and the “slave” replica of employees N-Z, and server 2 has the master replica of employees N-Z and the slave replica of employees A-M). Additionally, a coordinator function which controls the master replica may be separated from the data itself (e.g. Server 1 has the data structure, but server 2 makes decisions about which updates are applied to the replica on server 1).
Note that, while in a distributed environment, clients connect to servers to access applications executing on those servers, this classification is not fixed. Typically, servers also assume the role of clients and connect to other servers to process a request submitted by their client computers. Thus, in distributed environments, computers take on the roles of client or server depending on the need. More generally, communications may take place on a peer-to-peer basis, rather than client-to-server.
As mentioned, there are broadly two common techniques for propagating the changes between two replicas. In the state-based approach, the changes made to one replica are logged in terms of the different items that have been modified (changed, deleted or created). During synchronization, the state changes are propagated from the first replica to the other replicas. Typically, in cases where the same item has been modified in more than one replica, or where an item with the same identifier has been added to two different instances, a conflict is generated that needs to be handled in an application-specific manner. Otherwise, the new and changed state is committed on the target replicas. An example of commercial software using such a state based replication is IBM DB2 Everyplace® (see http://www-306.ibm.com/software/data/db2/everyplace/).
In the operation-based approach, the operations performed on the data structure instance are logged along with the details of the arguments with which the operation was executed. For example, one operation may have a name “createOrder” and might take an item and a purchase order number as parameters. If the operations are being executed within a transaction, this information can also be logged. During synchronization, the log of operations is propagated from the modified replica to the other replicas, and the operation log from the modified replica is re-executed against the other replicas. Note that the operation log is executed against the current state of the other replicas. An example of a system which implements the operation-based approach may be found in “Programming Model Alternatives for Disconnected Business Applications”, RC23347, available from http://domino.watson.ibm.com/library/cyberdig.nsf/Home.
Both synchronization techniques have characteristics that may make one better than the other in certain situations and environments. For example, the size of the state-change log versus the size of the operational log depends heavily on the application program. An application which grants each employee a five-percent raise as a single operation requires a very small operation log (one operation), but conversely requires a large state-change log (every employee salary has changed). Alternatively, an application that examines the entire database but makes no changes will require a zero-length state-change log but a non-zero-length (possibly large) operation log (e.g., if each employee was examined in a separate operation).
Similarly, operation-replay systems require the operations to be re-executed against each instance of the database, potentially consuming a lot of CPU time. In contrast, state-change logging may require less CPU time if the number of changes is small in comparison with the time to execute the operations.
Finally, state-based synchronization may be more prone to detection of false conflicts than operation-based synchronization systems. For example, if a bank account is debited in multiple replicas of the database, state-change logging will view this as a conflict. In contrast, operation-based synchronization will ultimately combine all the debits, and will not flag a conflict unless the account is overdrawn.
In the current state of the art and practice in this area of synchronization, practitioners and scientists have chosen one synchronization scheme or the other and have argued about the merits of one system over the other. However, it is clear from the above explanation that each technique has situations under which it outperforms the other in terms of commonly-used metrics such as CPU time and network bandwidth.
Accordingly there is a need to provide an overall system that can combine these two techniques into a hybrid synchronization method that can be used to choose the best technique dynamically based on the particular synchronization session.