The enterprise computing landscape has undergone a fundamental shift in storage architectures in which the central-service architecture has given way to distributed storage systems. Distributed storage systems built from commodity computer systems can deliver high performance, availability, and scalability for new data-intensive applications at a fraction of cost compared to monolithic disk arrays. To unlock the full potential of distributed storage systems, data is replicated across multiple instances of the distributed storage system at different geographical locations, thereby increasing availability and reducing network distance from clients.
In a distributed storage system, objects are dynamically placed in (i.e., created in, deleted from, and/or moved to) various instances of the distributed storage system based on constraints. Existing techniques such as linear programming may be used to determine the placement of objects subject to these constraints for small-scale distributed storage systems. However, there are few existing techniques for efficiently placing objects that are subject to constraints in a planet-wide distributed storage system that stores trillions of objects and petabytes of data, and includes dozens of data centers across the planet.
One approach is to scan all object metadata, decide on the action for each individual object, and execute that action right away. However, this approach doesn't ensure timely satisfaction of placement constraints. For example, scanning trillions of objects could require weeks. In addition, this approach makes it difficult to achieve good utilization of resources (e.g., the density of objects that require action may vary widely across the whole set of objects).