Thanks to the advent of cloud services, distributed databases have benefited from deployments spanning a cluster of machines within a single cloud. In such a setting, replication is the advocated solution to achieve data scalability and fault-tolerant durability. However, entire clouds may fail, rendering critical data unavailable, which ultimately leads to revenue losses.
With embodiments of this invention, the following problem is addressed: How can multiple storage clouds or servers be leveraged to enable transactional access to shared data among a large, possibly unbounded, number of clients despite crashes, asynchrony and concurrency?
An increasing number of data serving platform providers such as Amazon and Yahoo! have recognized the need for transactional access to shared data in addition to atomic read/write. It is possible to construct strong coordination primitives enabling transactional access just from atomic read/write operations. However, it is known that such constructions do not scale since the space- and communication overhead is proportional in the number of clients. Furthermore, leaving it to the developer to directly deal with the intricacies of implementing strong data-sharing primitives, such as read-modify-write, from weaker ones may result in inefficient and/or error-prone implementations.
As a consequence a number of data serving platforms such as DynamoDB, PNUTS and cloud data-bases such as Couchbase, MongoDB, Redis etc. have started including in their APIs coordination abstractions stronger than read/write. The most powerful among such primitives is Compare-And-Swap, CAS, for it enables implementing any shared functionality in a non-blocking manner, i.e. without using locks.
Intuitively, CAS updates a storage location only if the current value of that location is as expected, where the expected value is supplied to CAS along with the new value. Typically, CAS is used for optimistic concurrency control as follows: (1) a storage location x is read into a local variable v, then (2) based on the value of v some local computation is done that changes v and then (3) x is updated with v via CAS. If x didn't change, then x takes the new value v. Else x remains unchanged. In that case steps (1)-(3) are repeated. Used this way, CAS enables transactional access to shared data.
For example, Gregory Chockler and Dahlia Malkhi. 2002, “Active disk paxos with infinitely many processes”, In Proceedings of the twenty-first annual symposium on Principles of distributed computing (PODC '02) (herein “[PODC02]”) illustrate the above data transaction.