As the value and use of information continue to increase, individuals and businesses seek additional ways to process and store information. One option available to users is information handling systems. An information handling system generally processes, compiles, stores, and/or communicates information or data for business, personal, or other purposes, thereby allowing users to take advantage of the value of the information. Because technology and information handling needs and requirements vary between different users or applications, information handling systems may also vary regarding what information is handled, how the information is handled, how much information is processed, stored, or communicated, and how quickly and efficiently the information may be processed, stored, or communicated. The variations in information handling systems allow for information handling systems to be general or configured for a specific user or specific use such as financial transaction processing, airline reservations, enterprise data storage, or global communications. In addition, an information handling system may include a variety of hardware and software components that may be configured to process, store, and communicate information and may include one or more computer systems, data storage systems, and networking systems.
The data storage systems of at least some information handling systems employ redundant array of independent drives (RAID) technology to enable the widespread use of low cost persistent mass storage devices without a corresponding decrease in reliability. RAID technology may employ a plurality of physical storage devices in combination with data redundancy, parity information and/or other form(s) of error checking information, or a combination thereof, to provide a “virtual drive.” User data and error checking information may be distributed among the plurality of physical storage devices of a virtual drive in different configurations and granularities.
As a common example, a RAID 5 virtual drive spanning N physical storage devices writes user data to N−1 of the physical storage devices and parity data to the remaining physical storage device. The physical storage device to which the parity information is stored varies depending upon the applicable storage address. A block of user data may be “striped” across N−1 of the N physical storage devices, with each physical storage device storing 1/(N−1) of the user data block. Other RAID configurations employ different combinations of redundancy, striping, and error checking information as is well known in the field of data storage and data storage systems.
RAID-based storage systems may employ one or more redundant physical storage devices that are available to store data from a physical storage device that has exhibited one or more failures. Because these redundant physical storage devices are generally configured such that they can be swapped into a given virtual drive while maintaining power, they are often referred to as hot spare drives, hot spare drives, hot spares, or the like.
Historically, RAID controllers have used hot spare drives in conjunction with a “rebuild” process to restore a virtual drive to a normal state with data integrity following one or more errors. Rebuilding data for a RAID virtual drive can be a time consuming process that increases as the number of physical storage devices increases and as the size or capacity of the physical storage devices increases. A rebuild operation may occupy a significant percentage of a RAID controller's available computing and/or storage bandwidth. In addition, conventional rebuild processes may not be able to withstand a physical storage device failure that occurs during rebuild and, as a result, user data may be permanently lost.
While some RAID levels can recover from two disk failures, e.g., RAID 6, rebuild happens sequentially, rebuilding the first failed disk before rebuilding the second failed disk. However, conventional rebuild implementations cannot recover data if multiple physical storage device failure occur, either simultaneously or serially, during rebuild.