A distributed database is a database in which storage devices are not all attached to a common central processing unit (CPU). A distributed database may be stored in multiple computers located in the same physical location, or may be dispersed over a network of interconnected computers at multiple physical locations. The locations or sites of a distributed system may be spread over a large area (such as the United States or the world) or over a small area (such as a building or campus). The collections of data in the distributed database can also be distributed across multiple physical locations.
Typically, it is an object of a distributed database system to allow many users (clients or applications) use of the same information within the collection of data at the same time while making it seem as if each user has exclusive access to the entire collection of data. The distributed database system should provide this service with minimal loss of performance (latency) and maximal transaction throughput. That is, a user at location A must be able to access (and perhaps update) data at location B. If the user updates information, the updates must be propagated throughout the resources of the distributed database system to maintain consistency in the distributed database system.
The updates (or database transactions) must be serialized in the distributed database system to maintain consistency. If transactions were executed in serial order, concurrency conflicts would never occur because each transaction would be the only transaction executing on the system at a given time and would have exclusive use of the system's resources. The new transactions would see the results of previous transactions, plus changes made by that transaction, but would never see the results of transactions that had not yet started. In operation, transactions typically execute concurrently and require simultaneous access and modification to the same resources. Thus, maintaining consistency in a distributed database system can be very complex and often results in unacceptable response times.
Various concurrency control schemes currently exist such as, for example, optimistic concurrency control schemes, which operate by detecting invalid use after the fact. The basic idea of these types of schemes is to divide a database transaction's lifetime into three phases: read, validate and publish. During the read phase, a transaction acquires resources without regard to conflict or validity, but it maintains a record of the set of resources it has used (a ReadSet) and the set of resources it has modified (a WriteSet). During the validation phase, the optimistic concurrency control scheme examines the ReadSet of the transaction and decides whether the current state of those resources has since changed. If the ReadSet has not changed, then the optimistic assumptions of the transaction are proved to have been right, and the system publishes the WriteSet, committing the transaction's changes. If the ReadSet has changes, then the optimistic assumption of the transaction are proved to be wrong, and the system aborts the transaction resulting in a loss of all changes.
In order to avoid the unnecessary abortion of transactions whose assumptions are proven to be wrong, prior art systems have been designed to reconcile incorrect assumptions. Unfortunately, to date, these mechanisms require that a specific reconciliation procedure be individually coded for each possible irresolvable event.