Many businesses and their customers demand high availability of data in databases—including fast, reliable data access and the ability to access up-to-date data from any point in a distributed network. A single, centralized database may be relatively easy to maintain but may have unacceptable access delays resulting from communication bottlenecks, processor and disk access speed limitations and a lack of fail-over recovery capability. Also, a centralized database may not provide the ability to initiate updates with low latency from multiple points in the network.
Therefore, databases are commonly replicated to multiple data processing systems within the network. Each replica provides low latency access to local database users and/or fail-over recovery capability in case of failures. Database replication also enables users of portable data processing devices to work with an updated copy of a database (or part of a database) that is stored on the portable device, instead of having to maintain a constant wireless connection to a centralized database. Database replication may be periodic—according to a defined schedule—or may be continuous based on database changes or by user requests.
A database replication method typically involves capturing database changes at one of the systems storing a database replica, sending the changes to a second system, and applying the changes to a database replica stored on the second system. It is known in the art for the capture process to read changes from log records maintained by a database manager on the first system. Thus, an application program responds to user inputs to request updates to the database replica on the first system, and the local database manager applies the updates to its local replica and updates its recovery log. A capture process running on the first system reads the recovery log and forwards recent log records to the second system. The original database update and writing of log records at the first system, and the apply process running on the second system, are typically implemented under transactional control. Thus, data integrity can be maintained despite process and hardware failures, even in a distributed environment in which multiple changes may be initiated concurrently from different points in the network. A single database change transaction may involve several individual changes that must all be completed successfully, or all backed out, to maintain data integrity. Mechanisms other than log scraping may be used to capture the change transactions.
The database changes may be communicated between database replica systems via messaging, such as implemented by the RepliData™ database replication product from IBM Corporation. Each message typically contains a full record of the changes of an originating database change transaction (unless the data size is too large for a single message). A suitable messaging system is the WebSphere™ MQ message queuing software from IBM Corporation. A database change transaction performed at the transaction-originating system, and transactions performed at a sender system of a replication transmission, are referred to below as ‘source transactions’ or ‘captured transactions’.
One known approach to database replication uses intermediate staging tables that contain a description of the changes made to a first database replica, the information in the staging tables then being applied to other replicas. Although such staging tables can be useful to asynchronously manage updates to database replicas, the staging tables may act as a bottleneck that limits the throughput of database changes. There are ever-increasing business demands for high throughput—some applications (such as in the banking sector) requiring several million database changes to be managed every day. Therefore, communication bottlenecks that limit database replication throughput will not be acceptable in future.
There is a need in the art for improved low-latency data replication in a distributed data processing environment. Improved low-latency replication is required for cache management as well as for database replication. There is also a need in the art for efficient methods for ensuring once-only application of data updates to a cache or database replica, with recovery processing to maintain data integrity when failure occur.