1. Field
The disclosed embodiments generally relate to data storage systems that maintain replicated copies of data items for fault-tolerance purposes. More specifically, the disclosed embodiments relate to the design of a data storage system that automatically moves replicated copies of data items to various locations in the data storage system to improve fault tolerance.
2. Related Art
Organizations such as airlines and banks typically store large volumes of data in large storage systems containing hundreds (or even thousands) of computer systems and disk drives. Some of these storage systems include multiple data centers situated at different geographic locations to facilitate communication with geographically distributed client systems and to provide a measure of fault tolerance. Such data storage systems are typically organized hierarchically. For example, an exemplary storage system can include multiple data centers, wherein the machines within each data center are organized into rows, wherein each row includes a number of racks, wherein each rack contains multiple servers, and wherein each server is attached to multiple disk drives that store the data.
To store the data reliably, such data storage systems often create multiple copies of data items and then store each copy at a different location. In this way, a failure at any one location will not result in the loss of a data item. Moreover, the farther the system locates copies of a data item from each other, the more reliable the system becomes because failures become less correlated. For example, if the system locates all copies of a data item on the same disk drive, a failure of the disk drive will cause a loss of all copies of the data item. On the other hand, if the copies are located on different disk drives but are part of the same server, a kernel bug or a power supply problem can take out the entire server. Similarly, a failure in a switch can take out an entire rack, a failure in a power distribution unit can cause an entire row to go down, or a networking problem can cause an entire data center to go offline.
However, the advantages of locating copies of a data item farther away from each other need to be balanced against the fact that moving the copies farther away from each other can be more expensive in terms of bandwidth. For example, bandwidth between data centers is typically more expensive than bandwidth within a data center, and the bandwidth between racks is typically more expensive than the bandwidth within a rack. At present, a designer of a storage system typically analyzes the requirements of the storage system and makes judgment calls about how this tradeoff should be made. However, this approach does not work well as the load on the storage system and the storage system's structure evolve over time. For example, bandwidth can become more expensive as the system becomes more heavily loaded, which can make it more advantageous to locate copies of a data item closer to each other. Also, copies of a data item may need to be relocated when a system component fails.
Hence, what is needed is a system that manages the locations of copies of data items in a manner that can adapt to changing loads and system configurations.