1. Field of the Invention
The present invention relates to databases, and more particularly to database replication technology.
2. Background Art
Data replication is the process of maintaining up-to-date and multiple copies of a database object in a distributed database system. Performance improvements can be achieved when data replication is employed, since multiple access locations exist for the access and modification of the replicated data. For example, if multiple copies of a data object are maintained, an application can access the logically “closest” copy of the data object to improve access times and minimize network traffic. In addition, data replication provides greater fault tolerance in the event of a server failure, since the multiple copies of the data object effectively are online in a distributed system if a failure occurs.
Different solutions exist to obtain data from a source of modifications, for example a primary database, and to provide the data to a replicate or target database. In some cases, data may be replicated at a different intervals by obtaining a “snap-shot” of a source of data or a “snap-shot” of modifications to source data that is to be replicated. In some cases, a user may need a copy of data updated as soon as possible, in this case, to data is replicated as the modification is done on the primary database, without waiting for a process to obtain a snap-shot.
Data replication can be accomplished synchronously or asynchronously. In asynchronous replication, modifications to data at a primary database are replicated to replicate databases and the replicate or target database is updated only after a source at the primary database has been modified. Therefore, replication on the target database will occur after a delay of time, known as latency. An asynchronous replication solution can use different methods to transfer replication information. One benefit of asynchronous replication is minimal impact or intrusion to the primary database. The primary database does not need to wait until the replicated databases receive the data. As an example, the method used to extract changes from the primary database does not depend if the replication is asynchronous or synchronous.
Replication can use different methods to transfer data. Log based replication involves storing of the data modified by a data manipulation language (DML) statement into a log. A process may then read the log to extract and send information associated with the modified data to a replicate or target database. Statement replication includes transferring a data modification language statement itself to a replicate or target database. There is no data replication in such case, but data between a primary database and the replicate database continues to be in synchronization. In an exemplary case of statement replication, data associated with a statement is not transferred, but only the text of the statement travels around the replication system.
Statement replication has to ensure that the statement executed on the primary and replicate database will affect exactly the same set of data. But the results of a statement executed in the source database and replicate databases can be different depending on the replication architecture. For example, if data on a replicate database is a subset of data on the primary database, the same statement may affect a different set of data when it is replicated from the primary database to the replicate database. In such cases, DML replication will result in data at the primary and the replicate database being out of synchronization, which should be avoided.
Therefore, what is needed is a system, method and computer program product that logs and replicates DML statements in a manner that allows consistency between data in a primary database, and one or more replicate databases regardless of replication architecture.