Distributed computing refers to a computer configuration where a group of computing devices, often called servers, collectively provide one or more computing-related services. With the increasing popularity of the Internet, distributing the computation and processing involved in any specific application has become a very commonplace practice, since it allows computational burden to be shared across multiple servers and helps reduce catastrophic single-point failures. Accordingly, many indispensable Internet-based services, such as the Domain Name Service (DNS) used to resolve machine host names to IP addresses or the popular Google search engine, are implemented as distributed services, with multiple computers collaboratively supporting a large set of users (commonly called clients).
Any distributed system must provide a mechanism of load distribution, since the servers must use some protocol to negotiate or decide the way in which the total workload is allocated among the constituent servers. The total workload itself can be considered to be defined by an abstract workload space. Among other things, this space can consist of application-specific objects (such as files or other data elements). The overall workspace can be managed in a cooperative distributed environment in one of two distinct ways:
1. Partitioning: where each server is responsible for some portion of the overall workload space. In such a strategy, the overall workspace of the application (such as the files stored by a distributed file system) is partitioned among the different servers, with each individual component of the workspace element being allocated to a specific server. Clients interested in retrieving or modifying a particular component must then be routed to the server that is responsible for the corresponding portion of the workspace.
2. Replication/Cloning: where the entire workspace (the entire contents of the application) is copied to each of the constituent servers so that each server can essentially operate as a stand-alone application. In this approach, the contents of all the servers are identical: a copy of each individual component is present at each server. A client interested in retrieving or modifying a state element can thus interact with any individual server. Servers are responsible for keeping all other servers informed of any changes to the state of any component.
Many interesting and emerging distributed computing applications are characterized by rapidly varying or dynamically changing components. These components may include content stored on the servers or content generated by the clients. Such applications include, but are not limited to, multi-player online games, directory services for pervasive data sources and distributed storage of vehicular information (Telematics). For such applications, replication is generally not a feasible strategy, since the overhead of maintaining identical copies of rapidly changing components at multiple servers proves to be unacceptable. Accordingly, such distributed applications employ partitioning to spread out the processing load. Most of the current implementations of load-spreading however have two basic limitations:
1. The number of servers over which the application is distributed is decided a-priori. Normally, some off-line technique, such as predictions of the forecast client loads, is used to calculate the number of servers needed. Once a set of servers are dedicated to a particular application, they typically remain committed to that application, even though the load levels may unexpectedly increase or decrease from their predicted profile.
2. The partitioning of the responsibility among the set of constituent servers is also decided a-priori. From an abstract point of view, this partitioning is usually based on dividing up of some namespace. For example, a set of 26 servers can carve up responsibility for storing files by assigning each server responsibility for a unique letter of the alphabet (e.g., Server 1 manages all files having names beginning with “A”, Server 2 all files having names beginning with “B”, etc.), or different servers can be assigned to handle different Internet domains (e.g., Server 1 handles .com, while Server 2 handles .org, etc.).
One drawback of conventional distributed computing applications is that they typically lack on-demand allocation of resources. Often times, the running cost of an application is based on the quantity of computing resources (e.g., the number of servers) that it consumes. If this number is fixed and decided a-priori, then it is generally based on a peak load (worst case) estimate. Such peak-based provisioning is often unacceptable since the application incurs expense even when the loads are potentially much lower and can be serviced by a much smaller set of servers. By making the deployment of additional servers a function of the current load, the distributed application can considerably reduce its need for the average number of computing devices. An effective solution would increase the number of servers during spikes in the workload, and similarly reduce the number of participating servers when the workload intensity diminishes.
Another drawback of current distributed application schemes is the lack of adaptation in workload partitioning. To effectively utilize a given set of servers, the application must be capable of dynamically modifying the partitions (portions of the total workspace) assigned to individual servers. The basic idea of adaptive partitions is to allow a server facing unexpectedly high workloads (or some other problems) to seamlessly migrate some of its responsibility to one or more alternative servers, which currently may have spare capacity. With this form of adaptive partitioning, the distributed system can adjust the allocation of responsibility in response to changing skews in the actual workload. Under this approach, the capacity of the application can be defined by the cumulative capacity of all the servers, and not be bottlenecked by the capacity of an individual server, since the application could now dynamically relocate load from an overloaded server to an under-loaded one.