1. Field of the Invention
Embodiments of the present invention relate to the field of data storage systems. More particularly, embodiments of the present invention relate generally to the distribution of data and spare storage in a data storage system.
2. Related Art
Secondary data storage is an integral part of large data processing systems. A typical data storage system in the past utilized a single, expensive magnetic disk for storing large amounts of data. This single disk in general is accessed by the Central Processing Unit (CPU) through a separate Direct Memory Access (DMA) controller. The DMA controller then translates and executes the Input/Output (I/O) requests of the CPU. For single disk memory storage systems, the speed of data transfer to and from the single, large disk is much slower than the processing speed of the CPU and acts as a data processing bottleneck.
In response, redundant arrays of independent disks (RAIDs) have evolved from single disk storage systems in order to match the speed of secondary storage access with the increasingly faster processing speeds of the CPU. To increase system throughput, the RAID architecture of secondary storage allows for the concurrent access of data from multiple disk drives.
The concept for the RAID architecture was formalized in an article written by some members of the Department of Electrical Engineering and Computer Sciences at the University of California at Berkeley, entitled: “A Case for Redundant Arrays of Inexpensive Disks (RAID),” by D. A. Patterson, G. Gibson, and R. H. Katz, ACM SIGMOD Conference, Chicago, Ill., June 1988, hereinafter referred to as “Patterson et al.”
Typically, RAID architectures consist of one or more host interface controllers connected to several peripheral interface controllers via a high speed data bus. Each peripheral interface controller is, in turn, connected to several individual disk drives which provide the secondary storage for the connected hosts. Peripheral interface controllers, also referred to as array controllers herein, can be connected to the disk drives via common communication interfaces (e.g., SCSI). Generally, the speed of the data bus is greater than the speed of the interface between the disk drives and the peripheral interface controllers.
In order to reconstruct lost data in a redundancy group due to a failed disk, the system must define a reversible mapping from the data and its redundancy data in the group containing the lost data. Patterson et al. describe in their paper several such mappings. One such mapping is the RAID level 4 (RAID-4) mapping that defines a group as an arbitrary number of disk drives containing data and a single redundancy disk. The redundancy disk is a separate disk apart from the data disks.
Another mapping, RAID level 5 (RAID-5), distributes the redundancy data across all the disks in the redundancy group. As such, there is no single or separately dedicated parity disk. As the number of disks in a RAID-5 array increases, the potential for increasing the number of overlapped operations also increases. In a RAID-4 configuration, the physical drive(s) containing the redundancy become(s) a bottleneck for small random write operations. RAID-5 configurations alleviate this problem by distributing the redundancy across all drives. Hence, the RAID-5 configuration results in better write performance over the RAID-4 configuration.
In order to recover from physical device failures (e.g., a disk), functions are used that generate redundancies of a group of stripe units (e.g., an XOR function). The redundancies, that regenerate data lost from physical device failures, are then mapped to distinct physical devices. Normally, each member of the group is stored and mapped to a different physical device in order to make the recovery possible. The set of functions form a set of equations with a unique solution. A single even parity function is commonly used and can recover from any single device failure in the group. Some implementations use two functions, generally referred to as P and Q parities, to recover from any two device failures in the group.
Moreover, in order to reduce the Mean Time to Repair (MTTR), one or more spare devices are included in the array to start reconstruction of data on a device as soon as the device failure is detected. Storage systems with additional spare disks are designed to operate continuously over a specified period of time, without requiring any repair of the system due to failed disks. This is accomplished by carefully identifying and quantifying the components that are expected to fail during a given time period, and incorporating within the system sufficient hot-spare parts or disks. This internal spare disk architecture can automatically switch to the spare disks when a failure is encountered. Spares are incorporated so that compatible disk devices are always at hand upon a disk failure.
Previously, data was contained on disks independent from disks containing spare storage. Exclusively using disks for containing data or for spare storage leads to both performance and reliability problems. For example, the mean time between failure (MTBR) for each disk containing data remains the same even though the MTTR for the entire system is increased with the additional spare disks. Since the disks containing data have equivalent MTBR, failure of each of the disks containing data will occur approximately within the same time period. However, once the failed disks are replaced with all available spare disks, the remaining failed disks cannot be replaced and the system will lose data.
Additionally, accessing data on disks containing only data requires full movement of the read/write head(s) throughout all tracks and sectors of a disk. In any accessing scheme, the time to failure of the mechanisms involving the read/write head(s) is at its shortest period with full access to all the sectors and tracks of the disk. Also, the average seek time for data remains unchanged when the spare storage is located independently from data storage.