1. Field of the Invention
This invention relates to computer data storage systems, and more particularly, to Redundant Array of Inexpensive Disks (RAID) systems and to initialization procedures in storage arrays.
2. Description of the Related Art
A continuing desire exists in the computer industry to consistently improve the performance and reliability of computer systems over time. For the most part, the desire for improved performance has been achieved for the processing or microprocessor components of computer systems. Microprocessor performance has steadily improved over the years. However, the performance of the processors in a computer system is only one consideration associated with the overall performance of the computer system. For example, the computer memory system must be able to keep up with the demands of the processor or the processor will become stalled waiting for data from the memory system. Generally computer memory systems have been able to keep up with processor performance through increased capacities, lower access times, new memory architectures, caching, interleaving and other techniques.
Another critical aspect associated with the overall performance of a computer system is the I/O system performance. For many applications, the performance of the mass storage system or disk storage system serves a significant role in the I/O system performance. For example, when an application requires access to more data or information than it has room for in allocated system memory, the data may be paged in/out of disk storage to/from the system memory. Typically the computer system's operating system copies a certain number of pages from the disk storage system to main memory. When a program needs a page that is not in main memory, the operating system copies another page back to the disk system and copies the required page into main memory. Processing may be stalled while the program is waiting for the page to be copied. If storage system performance does not keep pace with performance gains in other components of a computer system, then delays in storage system accesses may overshadow performance gains elsewhere. Computer storage systems must also reliably store data. Many computer applications cannot tolerate data storage errors. Even if data errors are recoverable, data recovery operations may have a negative impact on performance.
One technique for increasing the capacity, performance and reliability of disk storage systems is to employ an array of storage devices. An example of such an array of storage devices is a Redundant Array of Independent (or Inexpensive) Disks (RAID). A RAID system improves storage performance by providing parallel data paths to read and write information over an array of disks. By reading and writing multiple disks simultaneously, the storage system performance may be greatly improved. For example, an array of four disks that can be read and written simultaneously may provide a data rate almost four times that of a single disk. However, using arrays of multiple disks comes with the disadvantage of increasing failure rates. In the example of a four disk array above, the mean time between failure (MTBF) for the array will be one-fourth that of a single disk. It is not uncommon for storage device arrays to include many more than four disks, shortening the mean time between failure from years to months or even weeks. Some RAID systems address this reliability issue by employing parity or redundancy so that data lost from a device failure may be recovered.
One common RAID technique or algorithm is referred to as RAID 0. RAID 0 is an example of a RAID algorithm used to improve performance by attempting to balance the storage system load over as many of the disks as possible. RAID 0 implements a striped disk array in which data is broken down into blocks and each block is written to a separate disk drive. Thus, this technique may be referred to as striping. Typically, I/O performance is improved by spreading the I/O load across multiple drives since blocks of data will not be concentrated on any one particular drive. However, a disadvantage of RAID 0 systems is that they do not provide for any data redundancy and are thus not fault tolerant.
RAID 5 is an example of a RAID algorithm that provides some fault tolerance and load balancing. FIG. 1 illustrates a RAID 5 system in which both data and parity information (or “parity data”) are striped across a plurality of storage devices forming an array. A data volume is divided into segments or blocks called stripe units. Stripe units are mapped consecutively on a set of physical devices for parallel access purposes. Generally speaking, in order to recover from physical device failures, functions (redundancies) of a group of stripe units are generated and mapped to distinct physical devices. In the illustrated system, this redundancy is in the form of the parity data. Each member of the group is mapped to a different physical device in order to make the recovery possible. The set of functions typically form a set of equations with a unique solution. Most common implementations use a single even parity function which can recover from any single device failure in the group. Some implementations use two functions, generally referred to as P and Q parities, to recover from any two device failures in the group. This extension to RAID 5 is sometimes referred to as RAID 6.
In RAID 5 systems (and similar systems that employ data striping with redundancy), during write operations if the entire data involved in a redundancy group is to be written (i.e., all of the stripe units of the given stripe), then the parity data can be readily generated. However, normally a write operation involves only part of the data involved in the group. In this case, typically depending on the size of the data to be updated, the parity data may be updated in either of two ways. The parity data may be updated by reading the remaining unchanged data blocks and computing new parity data in conjunction with the new data to be written. This scheme is referred to as a “reconstruct write” scheme. The parity data may alternatively be updated by reading the old data corresponding to the data to be written along with the old parity data and using this information in conjunction with the data to be written to generate the new priority data. This scheme is referred to as a “read-modify-write” scheme. This scheme is based on the fact that the functions used (e.g., parity) are generally idempotent binary functions. In either case, the additional read and write operations can limit performance. This limitation is known as a small-write penalty problem.
The read-modify-write scheme is efficient for “small writes” and is commonly used. However, it requires that the redundancies be initially consistent with the data in the group. To achieve this initial consistency, after the definition of a RAID 5 device, the device is typically initialized. This involves writing the entire set with a consistent pattern (usually with zeros if even parity is used for redundancy). This is a time-consuming operation, and the storage array typically cannot be utilized for normal accesses during such initialization procedures.