In distributed database systems, data in a primary database are replicated to databases in various replicate destinations. Changes made to the primary database can frequently be recorded in a transaction log, which contains before and after images of the changed data. Replication can be achieved where an external system or component is responsible for reading from the database transaction logs and distributing the recorded changes to multiple replicate destinations via multiple replication paths.
Conventional systems implement a single scanner thread to scan the transaction log of the primary database and distribute the data to multiple replication paths. However, while the scanner thread is busy filtering and distributing data to the specific replication path, it cannot continue scanning of the transaction log. Thus, such conventional systems may incur significant performance bottleneck. The problem is exacerbated in a scenario when the number of replication destinations and paths are increased. Furthermore, due to the sequential nature of the distribution operation in a single scanner environment, the specific data modification of the primary database can only be processed and replicated after the previous data has been distributed. As a result, a single scanner cannot support for priority transactions, when the urgent data need to be processed and distributed immediately after it is generated.
Therefore, conventional systems fail to provide an ideal data replication mechanism with low performance overhead, high replication throughput and scalability.