Database replication is a process by which data residing in data tables at one location (the source location) are made available for use at other locations (the destination locations). In particular, it is the process of keeping the destination data, which resides in tables, synchronized with the source data contained in the source tables.
Transactional replication is a form of replication that moves the data changes from the source to the destination while preserving the transactional state of the source system such that the destination data always represents a distinct committed state of the source data. Transactional replication insures that the destination tables continue to reflect the transactional consistency of the replicated source tables. This can be contrasted with non-transactional replication systems that move the changes from the source to the destination and commit them in a way that does not guarantee that the state of the destination tables matches a particular transactional state of the data from the source tables. In transactional replication, there is typically one source to many destinations.
Peer-to-peer replication is a form of transactional replication in which every node in the topology acts as both a source and a destination. This form of replication is unique because it enables users to create topologies with many sources to many destinations, perform updates at every node in the topology and restore transactions to a node that failed after replicating to its peers while preserving transactional consistency. A key to this technology is the ability to track the originator of a transaction.
Peer-to-peer transactional replication is based on a very simple rule; all transactions in the configuration are well-known transactions. Each source either generates a well-known transaction or forwards a well-known transaction. Replication agents, upon distributing a well-known transaction, can decide in an unambiguous manner whether it needs to apply it at the destination or whether the destination has already received the transaction via an alternate replication path.
The transactional replication process can be separated into two main phases. The first phase is the tracking or harvesting of the changes at the source. Typically, there are two forms of harvesting: log-based, and trigger-based. Log-based harvesting uses a log reader agent to track the source changes as they occur by reading the source database transaction log. The database transaction log contains all changes that have occurred on the database such that they can be ordered by the commit time. Trigger-based tracking employs replication triggers that are used for this type of change tracking, which replication triggers are constructed in such a way that an insert, update, or delete on each of the replicated tables causes the trigger code to execute. This executed code in turn stores information about the change that has occurred. The well-known transaction feature can be implemented with both methods of harvesting.
The second phase in the transactional replication process is the delivery of those changes to the destination (also called distribution using a distribution agent). During the distribution phase the replication agent delivers transactions that were harvested after the last synchronization between the source and the destination. In transactional replication, the destination server keeps track of a “last seen” watermark from every source from which it receives a change. This watermark is the starting point for the next distribution phase between one source and one destination. While this watermark allows the replication agent to avoid redelivering commands from one source that has already distributed them to a destination, it is not sufficient for a topology with multiple replication pathways. By way of example, consider three nodes A, B, and C in a ring topology, where A is the source node and, B and C are the destination nodes. Note that node C uses node B as a second source. Effectively this means that C can receive A's changes directly from A or through B. Because the watermarks for sources A and B (relative to node C) do not normally contain any information about the originator of the changes, it is virtually guaranteed that A will deliver it's commands to C, and B will redeliver these commands on behalf of A, resulting in conflicts.
What is needed is an improved transactional data replication architecture.