Technical Field
This application relates generally to distributed data processing systems and to distributed storage systems and services.
Brief Description of the Related Art
Distributed computing systems are known in the art. One such distributed system is a “content delivery network” or “CDN” that is operated and managed by a service provider. The service provider typically provides the content delivery service on behalf of third parties. A “distributed system” of this type typically refers to a collection of autonomous computers linked by a network or networks, together with the software, systems, protocols and techniques designed to facilitate various services, such as content delivery or the support of outsourced site infrastructure.
Other examples of distributed computer systems include distributed storage systems and services, including distributed databases. A distributed storage system can be used to provide a cloud storage solution. A content delivery network may utilize distributed storage to provide a network storage subsystem, which may be located in a network datacenter accessible to CDN proxy cache servers and which may act as a source/origin of content, such as described in U.S. Pat. No. 7,472,178, the disclosure of which is incorporated herein by reference. In this regard, a network storage system may be indexed by distributed databases that map input keys to data that points to storage locations in the manner of a file lookup service. In this way, the storage system may be used for storage of Internet content, such as images, HTML, streaming media files, software, and other digital objects, and as part of a CDN infrastructure.
Distributed storage systems (including database systems and services) typically rely on a variety of system services to keep the system operating well. Such services might include, without limitation, monitoring for nodes that are down, migrating or replicating data, resolving conflicts amongst replicas, compacting data, age-based deletion of data, and the like. Some services are common to many kinds of storage systems, others are particular to the nature and architecture of the system. For example, consider the variety of existing distributed databases: a SQL database may need different services than a no-SQL database, and a document-based no-SQL database may need different services than a column-based no-SQL database.
A distributed storage system typically has many nodes, and so it typically has many workers potentially available to perform the necessary work. However, it is challenging to distribute tasks to the workers (and by extension to the nodes that the workers are running on) in an efficient way, given dynamically changing loads, various service types and potential node faults. The teachings hereof address the need to coordinate allocation of work and tasks in distributed computing systems, the need to dynamically adjust this allocation, and the need to minimize the overhead used in doing so. The teachings hereof relate to technical improvements in operation and management of distributed computing platforms, and in analogous technologies, and can be used to improve the operation and efficiency of a distributed computing platform, including distributed storage platforms. Many benefits and advantages will become apparent from the teachings hereof.