1. Field of the Invention
The disclosed embodiments relate generally to techniques for building client server applications. In particular, described embodiments include systems and methods for efficient interactions between application servers and distributed datastores.
2. Description of Related Art
Traditional client/server computing architectures are typically composed of two main components: the compute and storage tiers. The compute tier is sometimes broken down into the access and logic tiers, but their purpose remains the same (compute). These tiers are often physically separated on different computers for isolation, scaling, and other purposes. As requests are served by the compute tier, data is fetched and processed from the storage tier and manipulated before being sent back to the requesting client.
Components in the access tier manage the communication from clients. The logic tier includes algorithms to service requests from the access tier. The storage tier contains data stored in a persistent mechanism. By way of example, a remote client contacts a server with a request. This server contains both the access and logic tiers. The server communicates with the storage tier, in this case a remote database, and pulls data back to itself. It then applies business logic to this data and returns a response to the remote client. As the number of clients increases, the server tier can be scaled larger by the addition of more nodes. Servers are stateless, as data itself is stored in the storage tier. The traditional storage tier, however, does not scale by adding more database servers; there is no logic for partitioning data across a set of database servers. FIG. 1 illustrates a traditional three-tiered application utilizing a traditional datastore. A client 102 sends a request to a server 104; the server 104 fetches required data from a database 106 to fulfill the request and sends the appropriate response back to the client 102.
An application server is a piece of software code running in a service container or framework that is accessed over a network by other software. The application server provides logical ability to process data, but it stores data in a storage tier. This means that if an application server is restarted or shutdown, its processing ability is lost but data stored in the separate storage tier is available upon the application server's return.
One type of storage tier instance is a distributed datastore (FIG. 3), which is designed to scale by managing the distribution, replication and locality of data. Some distributed datastores have a master lookup node for tracking what data exists on which nodes; some using a hashing scheme; others have a gossip-based protocol for this. In any case, the effect is that a datastore client can access any data required on any of the datastore nodes transparently to the datastore client. Each node in the distributed datastore has a service layer that allows other nodes to communicate with it as well as a way for external applications to interact and read or write data to it.
Distributed datastores often contain built-in logic to manage replicas of individual pieces of data across multiple physical machines. This replication ensures availability of individual data despite individual machine failure.
FIG. 2 illustrates replication of data sets in a distributed data store. For a given piece of data, it is contained in a replica group—a logical collection of individual pieces of data. That data is then replicated across multiple servers for redundancy, high availability, etc. To account for data sets that are greater than a single server's resources, individual replica groups are partitioned across the set of all servers. In this illustration, the replica group 1 and replica group 2 represent the sum of all data—50% each. There are 4 servers, and each piece of data is replicated twice. Database server 1 and database server 2 maintain the copies of replica group 1, while database server 1 and database server 2 maintain the copies of replica group 2.