In a storage system with a storage array, data is stored across a plurality of data storage devices. Such data storage devices may be solid-state devices (SSDs) and/or magnetic disk drives, as in a Nimble Storage array manufactured by Nimble Storage™ of San Jose, Calif.
One technique that is employed in a storage array is data striping. Using a simplified example to illustrate data striping, suppose a document is to be stored on three data storage devices (A, B, C) in a storage array. In one data striping routine, the first word of the document may be written to device A; the second word of the document may be written to device B; the third word of the document may be written to device C; the fourth word of the document may be written to device A; the fifth word of the document may be written to device B; the sixth word document may be written to device C; and so on. Since there are 3 separate devices, three write operations may occur at the same time. Stated differently, the first, second and third words may be written in parallel to devices A, B and C, respectively; the fourth, fifth and sixth words may be written in parallel to devices A, B and C, respectively; and so on. Likewise, when the document is read from the storage devices, 3 words can be read at once: the first, second and third words may be read in parallel from devices A, B and C, respectively; the fourth, fifth and sixth words may read in parallel from devices A, B and C, respectively; and so on. Such example helps illustrate the increased read and write throughput (i.e., I/O throughput) for a storage array that uses data striping, as compared to a storage array that does not use data striping.
For a storage array to fully take advantage of the increased throughput available through data striping, each of the storage devices must have room to write new data. Otherwise, the data may only be written to the remaining storage devices (i.e., those that still have room), reducing the I/O throughput. In practice, storage devices within a storage array may reach (or approach) their respective capacities at different times. For instance, a storage device having a smaller capacity may reach its capacity sooner than a storage device having a larger capacity. Even if storage devices were to fill up at similar rates, a storage device that has been in use for a longer time would be expected to fill up before a storage device that has been in use for a shorter time. Such examples illustrate that, in general, some storage devices in a storage array may be more occupied (e.g., in terms of a percent of total capacity of a storage device) than other storage devices. To prevent one or more of the storage devices from completely filling up, data is typically migrated from storage devices that are more occupied to storage devices that are less occupied. While data migration techniques have been deployed in the field and exist in the literature, such data migration techniques are often computationally intensive and/or fail to preserve properties of the data distribution (i.e., how data is distributed among the storage devices) that are needed to fully take advantage of the potential gains (e.g., increased throughput) from data striping.