Disk arrays are used to provide storage for computer applications that need increased reliability in the face of component failures, as well as high performance for normal use. The disks in the disk arrays are often arranged as a redundant array of independent disks (RAID). The RAID arrays provide larger capacity, higher performance and, typically, higher availability for stored data than using disks individually. This is done by distributing the data across multiple disks and with back-up information. The back-up information may be a copy of the data or enough parity information to regenerate the data if a disk or related component fails. Storing a copy of the data usually provides higher performance for read operations, however, write operations can be slower, because both copies of the data must be updated in the RAID.
One problem with RAID arrays is that the disks are relatively inefficient in accessing small amounts of data that are not sequentially stored on a disk. In a typical 4 KB read, a conventional disk might require between 5 and 20 ms to position the disk head before beginning to transfer data, and less than 0.5 ms transferring the data. When copies of the data are stored in a disk array, small writes are typically even more inefficient. The original data and a copy must be written. Accordingly, disk heads corresponding to disks storing the original data and the copy spend time positioning themselves before writing the small amount of data.
Another problem with RAID disk arrays is that when a disk fails, the resulting extra load is not spread uniformly over the remaining disks, and the length of time for rebuilding the data onto a replacement disk is long.
There are several proposed techniques for ameliorating these problems, but each has its own disadvantages. In one technique, two copies of the data, each using different stripe sizes, are maintained. Both copies are on a disk, and the disk has both a “large-striped” copy and a “small-striped copy.” Having one copy that is large-striped improves performance for large, sequential input/output (I/O) accesses. However, there is no provision for spare space to accommodate disk failures, and this technique generally does not improve rebuild time after a disk fails.
A second technique incorporates distributed sparing. A spare space is distributed over a pair of disk arrays. If a disk fails, the data in that disk is reconstructed and temporarily stored in the spare space on the other array. When the failed disk is replaced, the data is then copied back to this disk. Because the data to be copied is distributed over the disk array, a significant amount of a disk head movement is typically needed to perform the copy-back operation, which results in poor performance.