Organizations move data between different databases. For example, this may be done for data backup purposes or to share data across different databases and applications. Traditionally, this has been mostly accomplished in batches, where a batch is a set of tasks. Currently, batch processing is performed periodically, for example on a daily basis. Commonly, the batch processing is performed on a table-by-table basis.
Traditional batch methods are undesirable in many ways. Examples of these shortcomings are highlighted in a number of examples. In these examples, assume data is moving in one direction from a “source” database S to a “target” database T.
Take as an example a bank which holds customer balances in a BALANCE table in database S. A customer C has a balance of $500, which is recorded in BALANCE. Now C attempts to withdraw $400 at 11 am. The banking application checks the BALANCE table in S, allows the transaction and updates the customer's balance to $100 ($500−$400).
Now suppose that database S crashes in the middle of the day and the bank decides to use the backup database T as its database of record. Now customer C attempts to withdraw $300. Assume that database S is backed up to database T once per day at 2 am. This means that instead of reflecting the correct current balance of $100, the database reflects an old (stale) balance of $500, which does not include the transaction at 11 am. As a result, the new withdrawal will be mistakenly authorized.
This is a very obvious example of the shortcomings of a batch approach. Another shortcoming with the batch approach is the temporary lack of transaction consistency. For example, if two tables are updated as part of a single transaction, but applied in different applications when moving data from S to T, this violates the “all or nothing” aspect of database transactions. Another problem is a temporary lack of referential consistency in the database. It is possible that an “order detail” record could be inserted into T before the corresponding “order master” record to which the detail belongs exists in the target. Another problem is that it is difficult, and sometimes impossible, to move the entire database S to T, in a reasonable amount of time.
For these reasons and others, logical data replication systems came into being. These systems are able to detect and move the latest changes to tables, rather than entire tables or sets of data, enabling the data in T to keep much more closely in sync with data in S (often as low as a second or less, rather than a day or more, behind); this is often accomplished by reading database transaction logs for the latest changes. Data replication systems preserve the sequence of the application of new operations into T as they occurred in S. They also preserve the original transaction integrity, which assures that unless all items in a transaction are successfully applied to T, they are all rolled back. Data replication systems also preserve referential integrity. For example, they guarantee that if an order detail record is being inserted into T, than the corresponding order master record exists
Despite the advantages of logical data replication systems, they also face a number of challenges. Performance is one challenge. The application which inserted data into S may have had many simultaneous transactions active and working in parallel. The replication system may be single threaded in order to assure that the original order remains intact, but in doing so, gives up much of the parallelism and performance of the source system, resulting in stale data in T. For performance reasons, a replication system may break up the workload of the transaction stream by table. For example, the ORDER_MASTER table in T may be inserted by one apply process and ORDER_DETAIL by a second process. In this case, it is possible to violate both transaction integrity and referential integrity constraints.
In view of the foregoing, it would be highly desirable to provide improved transaction data processing techniques. In particular, it would be desirable to provide more efficient techniques for applying database changes from a source to a target, while preserving both transactional and referential integrity rules.