The enterprise computing landscape has recently undergone a fundamental shift in storage architectures in which the central-service architecture has given way to distributed storage systems. Distributed storage systems built from commodity computer systems can deliver high performance, availability, and scalability for new data-intensive applications at a fraction of cost compared to monolithic disk arrays. To unlock the full potential of distributed storage systems, data is replicated across multiple instances of the distributed storage system at different geographical locations, thereby increasing availability and reducing network distance from clients.
In a distributed storage system, objects are dynamically created and deleted in different instances of the distributed storage system. However, different replication requests may have different priorities. It is important to execute replication requests in priority order so as to replicate the more important objects first. For example, a newly uploaded object has just one replica. Thus, it is more important to create replicas of the new object before creating replicas of existing objects that already has a plurality of replicas in order to minimize the probability of data loss in the new object. Another example is a video that becomes a hit over night. In this case, the number of replicas of the video needs to be increased as soon as possible in order to handle the increased demand. Therefore, it is desirable to properly prioritize replication requests and execute them in a timely fashion while sustaining very high loads.
One technique for prioritizing replication requests is to place the replication requests in a priority queue. Typically, a priority queue is implemented as an in-memory sorting data structure that returns the element from the queue that has the highest priority. This technique works reasonably well for small-scale systems. However, for large-scale systems such as distributed storage systems, all elements of the priority queue cannot fit into main memory. Another technique is to use external memory sorting algorithms. However, external memory sorting algorithms can impose long delays and typically require centralized coordination. Furthermore, de-queuing and processing of elements can become a bottleneck as well.
Thus it is desirable to system and method for replicating objects in a distributed storage system without the aforementioned problems.