Atomic transactions are often used to simplify concurrent and fault-tolerant programming. A transaction is atomic if it is indivisible, such that an attempt to perform the transaction can have only two possible outcomes: 1) either all parts of the transaction occur (transaction commits), or 2) no parts of the transaction occur (transaction aborts). Thus, for an atomic transaction, it is impossible for partial execution of the transaction to occur. For example, if a transaction is a transfer of funds from one account to another account, it is highly desirable is for this transaction to be atomic, to avoid the possibility of a credit being applied to one account without a corresponding debit in the other account (or vice versa). Similar advantages accrue to atomic transactions in more general programming situations.
One can identify two fundamental approaches for providing atomic transactions: 1) in-place update and 2) shadow copy. The first is normally implemented by locking to prevent concurrent updates as well as an undo log, to be able to undo changes in the case of transaction abort. It can also be implemented optimistically, instead of using locks, by aborting the transaction if another transaction writes data that this transaction is writing (a so-called write-write conflict) or writes data this transaction is reading (a so-called read-write conflict).
Historically, in-place update has been favored because of the benefits of maintaining the disk layout, given that most transactional data has been disk-based until recently. However, read-locking is a significant overhead, given the preponderance of reads over writes in many applications. Moreover, read-locking data means updates to the locked data are delayed, which really means it ends up being potentially inconsistent with the real world, if the corresponding real world values change during the time it is locked. In some sense, this means its focus on achieving internal consistency can lead to inconsistency with the external environment. The optimistic form of in-place update can suffer from excessive abort rate, given the prevalence of read-write conflicts.
With the move to in-memory databases, the shadow copy approach becomes more attractive than before. Here, an updating transaction makes a copy of the data to be updated, makes the modification and then atomically updates the root reference (or pointer) to this data to refer to the new (previously shadow) copy. Taking this approach further, a transaction can execute from snapshots (i.e. immutable copies) of even the data that it is just reading. This approach is referred to as snapshot isolation (SI). This approach provides most of the properties of conventional serialized transactions with the additional significant benefit of not suffering from the read-write conflicts of the in-place update transactions, which can either incur significant locking overhead or else increase the abort rate in the case of an optimistic implementation.
The shadow copy can be attractive in a distributed implementation because there it is often necessary to duplicate data at a process performing the transaction, thereby providing a copy or snapshot of the data to provide efficient local access. In this case, the copy overhead of SI is effectively eliminated by the access savings this copy provides for local processing, or conversely the creation of the local copy effectively has already paid the cost of a snapshot as required by the SI transaction mechanism. This local copy can also reduce a process's exposure to the failure and restart of a process containing the primary copy of the state because it can continue to operate with its local snapshot.
Unfortunately, SI does not provide sequential consistency (or serializability as it is described in the database world) in the sense that all executions produce the same results as if executed in some sequential order. The compromising behavior arises because of the so-called write skew problem. This problem can be illustrated by considering a simple assert constraint that a transactional application is expected to maintain. For example, consider the assert constraint:39>b+c; where separate transactions Tb and Tc can update b and c, respectively. If b and c are initially 10, Tb could update b to 20, viewing a snapshot of c as 10, whereas Tc could update c to 20, similarly viewing a snapshot of b as 10. These two transactions can both commit concurrently in the SI model because there is no write-write conflict, yet doing so causes the constraint to be violated.
Various solutions to this problem have been proposed, including ensuring strict serializability of SI transactions, but these lead to excessive overhead, either on the transaction processing itself or by increasing the abort rate of transactions far above that strictly necessary.
What is needed is a means of ensuring correct application behavior while maintaining application and implementation benefits of SI transactions.