A storage array or disk array is a data storage device that includes multiple disk drives or similar persistent storage units. A storage array can allow large amounts of data to be stored in an efficient manner. A storage array also can provide redundancy to promote reliability, as in the case of a RAID system. In general, RAID systems simultaneously use two or more hard disk drives, referred to herein as physical disk drives (PDs), to achieve greater levels of performance, reliability and/or larger data volume sizes. The phrase “RAID” is generally used to describe computer data storage schemes that divide and replicate data among multiple PDs. In RAID systems, one or more PDs are set up as a RAID virtual disk drive (VD). In a RAID VD, data might be distributed across multiple PDs, but the VD is seen by the user and by the operating system of the computer as a single disk. The VD is “virtual” in that storage space in the VD maps to the physical storage space in the PDs, but the VD usually does not itself represent a single physical storage device.
Although a variety of different RAID system designs exist, all have two key design goals, namely: (1) to increase data reliability and (2) to increase input/output (I/O) performance. RAID has seven basic levels corresponding to different system designs. The seven basic RAID levels are typically referred to as RAID levels 0-6. RAID level 5 uses striping in combination with distributed parity. The term “striping” means that logically sequential data, such as a single data file, is fragmented and assigned to multiple PDs in a round-robin fashion. Thus, the data is said to be “striped” over multiple PDs when the data is written. The term “distributed parity” means that the parity bits that are calculated for each strip of data are distributed over all of the PDs rather than being stored on one or more dedicated parity PDs. Striping improves performance because the data fragments that make up each data stripe are written in parallel to different PDs and read in parallel from the different PDs. Distributing the parity bits also improves performance in that the parity bits associated with different data stripes can be written in parallel to different PDs using parallel write operations as opposed to having to use sequential write operations to a dedicated parity PD.
In order to implement distributed parity, all but one of the PDs must be present for the system to operate. Failure of any one of the PDs necessitates replacement of the PD, but does not cause the system to fail. Upon failure of one of the PDs, the data and parity that was on the failed PD can be rebuilt by using the data and parity stored on the other PDs to reconstruct the data and parity that was stored on the failed PD.
In order to demonstrate the manner in which a rebuild process is typically performed, the manner in which a known RAID system typically operates will be described with reference to FIG. 1. FIG. 1 illustrates a block diagram of a known RAID system 2 comprising a computer 3, a RAID controller 4, and array 5 of PDs 6. When the computer 3 has data to write, an OS 7 of the computer 3 generates a write command, which is received by a file system (FS) 8 of the OS 7. The FS 8 then issues an input/output (IO) command to the RAID controller 4. The IO command contains the data to be written and virtual memory addresses where the data is currently located in a virtual memory 9. A RAID processor 4a of the RAID controller 4 receives the IO command and then maps the virtual memory addresses to physical addresses in one or more of the PDs 6. The RAID processor 4a maintains a table of the virtual-to-physical address mapping in a local memory device 4b of the RAID controller 4. The RAID controller 4 then causes the data to be written to the physical addresses in one or more of the PDs 6.
If one of the PDs 6 fails, the failed PD 6 is rebuilt by reading all of the stripes from the PDs 6 other than the failed PD 6, computing the data and parity of the failed PD 6 from all of the stripes read from the other PDs 6, and writing the computed data and parity to a replacement PD. The main issues associated with this rebuild technique are that they (1) take a very long time to perform, (2) consume a large amount of resources, and (3) detrimentally impact system performance during the rebuild process. In addition, while the rebuild process is ongoing, the RAID system 2 is at a lower level of protection or is without protection from data integrity risks in the event that another of the PDs 6 fails. Rebuilds can take days or weeks, and the performance of the RAID system 2 is detrimentally impacted during that time period.
In addition, as technological improvements in storage devices are made, their storage capacity greatly increases over time. For example, for some types of storage devices, storage capacity doubles every eighteen months or so. These increases in storage capacity mean that, in the event that one of the PDs fails, an even larger number of stripes are used to compute the new data and parity, which results in an even larger number of computations. Consequently, the amount of time that is required to perform the rebuild is further increased. Interestingly, a large part of the failed PD 6 is typically unused, but because this is not known to the RAID controller 4, it has no other option but to rebuild the failed PD 6 in its entirety.
One technique that has been used to reduce the amount of data and parity that has to be computed during a rebuild involves only rebuilding “used” portions of the failed PD 6. A portion of a PD 6 is considered “used” if it has been written with data. With this technique, the RAID controller 4 of the RAID system 2 marks zones on the PDs 6 that have been written so that it is able to distinguish between zones that have been written and zones that have not been written. If a PD 6 subsequently fails, new data and parity are only computed for zones in the failed PD 6 that were marked as written at the time of the failure.
This technique has several disadvantages. One drawback is that the FS 8 often moves data around, which causes the same data to be stored in different zones of the PDs 6 at different times. The OS 7 may subsequently free data, but although the FS 8 is aware that the data has been freed, the RAID controller 4 is not made aware that the data has been freed. Therefore, the RAID controller 4 continues to consider the zone in the PD 6 in which the freed data is stored as “used”. Consequently, any zone in the failed PD 6 that was “touched” (i.e., written) at any point in time will be rebuilt. This results in more data being rebuilt than is necessary, and the process tends to be degenerative over time. Another disadvantage of this technique is that services and applications exist that by their nature use inordinate amounts of space on PDs 6 temporarily and then free the data. Again, while the FS 8 is aware that the data has been freed, the RAID controller 4 is not, and so any zones in the failed PD 6 that were “touched” are considered “used” and therefore will be rebuilt. Consequently, much more data and parity are rebuilt than is necessary.
Yet another drawback of this technique results from the manner in which FSs typically operate. FSs are typically designed such that when making a choice between writing data to space that has never been written and writing data to space that has been written and subsequently freed, they choose to write data to space that has never been written. This results in “data sprawl” in that data gets written to more areas in the PDs than is necessary. Even if the data is subsequently freed, the RAID controller is unaware that the data has been freed and considers the corresponding zones in the PDs as used. Consequently, if a PD fails, any zones that were previously written, even if subsequently freed, will be rebuilt, which results in more data being rebuilt than is necessary. In addition, data sprawl can also result in only a small portion of a zone actually being used while other portions of the same zone are unused. When the zone is rebuilt, both the used and unused portions of the zone are rebuilt. Again, this results in more data being rebuilt than is necessary.
Accordingly, a need exists for a way to reduce the amount of time that is required to perform a rebuild process in a RAID system. A need also exists for a way to reduce the amount of data that needs to be rebuilt when performing a rebuild in a RAID system. A need also exists for a way to prevent data sprawl in a RAID system.