The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Computers systems may be configured to store and retrieve large amounts of data. Typically, computer systems rely on database systems to perform this function. Replication is the process of replicating data from a source database onto another database system, herein referred to as a target database.
One approach to replication is the physical replication approach. Under the physical replication approach, the changes made to data blocks on the source database are made to replicas of those data blocks on a target database. Another approach to replicating data is the logical replication approach. Under the logical replication approach, database commands that modify data on the source database are re-executed on the target database. While executing the same database commands guarantees that changes are replicated at the record level, the changes are not replicated at the data block level.
Typically, changes to database systems are made using transaction processing. A transaction is a set of operations that change data. In database systems, the operations are specified by one or more database commands. Committing a transaction refers to making the changes for a transaction permanent. Under transaction processing, all the changes for a transaction are made atomically. When a transaction is committed, either all changes are committed, or the transaction is rolled back.
Scalability describes the ability of a replication system to handle increasing amounts of data. One procedure to increase scalability involves a replication client applying individual transactions to the target database in parallel (“single transaction parallelism”). Single transaction parallelism involves considerable overhead, such as round-trip client-server communication and statement parsing. Furthermore, when the data includes many dependencies between transactions, the parallelism is reduced since a transaction cannot be executed until after all transactions on which the transaction is dependent are committed.
Another procedure to increase scalability involves grouping adjacent transactions into batch transactions and applying the batch transactions in parallel (“adjacent batch parallelism”). Adjacent batch parallelism helps reduce overhead by batching multiple changes in one statement. However, adjacent batch parallelism increases dependencies such that parallelism is further reduced.
Another procedure to increase scalability involves splitting transactions using some partitioning criteria (“partitioned parallel batching”). Individual transactions are partitioned using techniques such as vertical partitioning (e.g. based on table names) and/or horizontal partitioning (e.g. based on row ranges within a table). Similar partitions of different transactions are grouped into batch transactions, and the batch transactions are applied in parallel. In partitioned parallel batching, the partitioning scheme must be manually specified, and must be manually updated to accommodate for schema changes and/or workload changes. Additionally, partitioned parallel batching requires an implicit assumption that the batches don't conflict with each other. Furthermore, since transactions are split, partitioned parallel batching compromises the underlying atomicity of transactions.
Based on the foregoing, it is desirable to develop an approach that allows dependency-aware transaction batching for data replication.