Scalability is a requirement in many data storage systems. Different types of storage systems provide diverse methods of seamless scalability through capacity expansion. In some storage systems, such as systems utilizing redundant array of inexpensive disk (RAID) controllers, it is often possible to add disk drives (or other types of mass storage devices) to a storage system while the system is in operation. In such a system, the RAID controller re-stripes existing data onto the new disk and makes the capacity of the other disks available for new input/output (I/O) operations. This methodology, known as “vertical capacity expansion,” is common. However, this methodology has at least one drawback in that it only scales data storage capacity, without improving other performance factors such as the processing power, main memory, or bandwidth of the system.
In other data storage systems, it is possible to add capacity by “virtualization.” In this type of system, multiple storage servers are utilized to field I/O operations independently, but are exposed to the initiator of the I/O operation as a single device, called a “storage cluster.” Each storage server in a cluster is called a “storage node” or just a “node.” When data storage capacity becomes low, a new server may be added as a new node in the data storage system. In addition to contributing increased storage capacity, the new storage node contributes other computing resources to the system, leading to true scalability. This methodology is known as “horizontal capacity expansion.”
In a horizontally federated storage system with multiple storage nodes, a volume resides across various storage nodes. The volume is distributed such that each node owns a particular region of the volume. For example, data is striped across multiple storage nodes in conventional horizontally federated storage systems in much the same way as data is striped across disks in RAID arrays. The granularity of striping across storage nodes is at the territory level. Territory level striping, however, may not be able to provide network utilization scaling for sequential I/O operations, i.e., it may not be possible to ensure that different I/O operations are fielded by different nodes. This is because a group of sequential I/O operations may be served by a single node, which results in the other nodes remaining passive. While decreasing the granularity of striping (e.g., to chunk level) may ensure that the sequential burst of I/O operations is served by different nodes, chunk level striping would result in a greater and possibly unmanageable amount of metadata.