Computing applications deployed over networks, such as the Internet, often use distributed databases for hosting application-related data. A distributed database is a data store that includes multiple database nodes, generally deployed across multiple locations. The nodes operate in a decentralized yet coordinated manner. A distributed database can provide high availability and fault tolerance, such that any one node can suffer a failure without the database as a whole losing data or functionality. Well-known examples of distributed databases include Apache Cassandra (open source), Voldemort (open source), and Amazon DynamoDB, for example.
Clients of distributed databases can direct commands, such as Create, Read, Update, and Delete (CRUD), to one or more database nodes, with the nodes coordinating to service the commands. For example, a client may send a data update command to one node of the distributed database, and that node may propagate the update to other nodes over time. Likewise, a client may direct a read query to one node. The query may specify a number or percentage of database nodes to check for the requested data. The distributed database coordinates access to the specified number or percentage of nodes and returns a response to the client. The response generally includes the most recent version of the requested data found among the accessed nodes.
Owing to the decentralized nature of certain distributed databases, data written to one node is not always available immediately from other nodes. This is especially the case for so-called “eventually-consistent” databases, such as “AP” databases, which provide guaranteed levels of Availability and tolerance to network Partitioning, but no guaranteed level of consistency. Indeed, the CAP Theorem states that it is impossible for a distributed computing system to guarantee Consistency, Availability, and Partition tolerance all at the same time.