The present invention relates generally to the field of storage systems. More particularly, the present invention relates to the use of spare disk drives in storage systems comprising a Redundant Array of Independent Disks (RAID).
Storage systems are being increasingly used to store large amounts of data and decrease the processing undertaken by data processing systems in the storage of data. Storage systems comprise one or more storage devices, such as magnetic hard disk drives, tape drives, and the like. Storage systems also use special hardware to control these storage devices and reduce the processing undertaken by data processing systems in storing data. Storage systems either are connected to a data processing system or are used in a network environment, in which they are connected to a plurality of data processing systems by means of a network interface.
Traditionally, storage systems use high-performance storage devices. These storage devices are very expensive and, therefore, the overall cost of a storage system employing these storage devices is very high. This makes the use of such storage systems prohibitive, especially where cost is a key factor in deciding the deployment of storage systems. In contrast, in a Redundant Array of Independent Disks (RAID), a technique that is used in storage systems, a number of inexpensive disk drives are combined to improve the performance and reliability of a storage system at a lower cost, in contrast to traditional storage systems, which use high-performance storage devices.
Storage systems that utilize RAID techniques, or RAID systems, use a number of disk drives that are used to emulate one or more high-capacity, high-performance storage devices. RAID systems are based on various levels of RAID. A RAID engine is used in hardware or software form to carry out the processing required for the implementation of RAID techniques in a storage system. RAID systems also improve reliability of data by providing data striping and data parity protection. In order to store data reliably, RAID systems use spare disk drives that replace failed disk drives. This maintains the reliability of the system by ensuring that a drive is available, in the case of failure of an existing drive.
Initially, failed drives in RAID systems were manually replaced with spare drives. Failed drives can be replaced either by powering-off the entire RAID system or by ‘hot-swapping’ the failed drive. ‘Hot-Swapping’ is a technique that enables the removal or addition of a disk drive to a storage system without powering-off the system. ‘Hot-swapping’ reduces the downtime of the RAID system by enabling the RAID system to run even when a failed disk drive is being replaced. However, ‘hot-swapping’ in RAID systems is a tedious process and there might not always be a person around to replace the failed disk drive. This might lead to a decrease in the reliability of the RAID system because there can be a time gap between the failure of a disk drive and its replacement.
In order to overcome the dependence of ‘hot-swapping’ on manual intervention, RAID systems employ spare disk drives that are always available in the RAID system. For example, a spare disk drive can be maintained in a power-on or ‘hot’ condition. When a disk drive fails, the ‘hot’ spare disk drive is used in place of the failed disk drive. Data on the failed disk drive is reconstructed on the spare disk drives by using RAID parity techniques.
However, the above-mentioned system suffers from one or more drawbacks or limitations. It keeps the spare disk drives always ‘hot’, or in a power-on state. Disk drives have a fixed life, in terms of hours, before they fail. Since the spare disk drives are always ‘hot’, even when they are in use, the life of the spare disk drives is reduced. The spare disk drives also consume electrical power, which might over the long run, become an unnecessary expenditure. Hence, such systems do not attain the required level of reliability and involve increased power consumption.
To avoid the above-mentioned drawbacks or limitations, spare disk drives can be maintained in a power-off state in a storage system. The spare disk drive is powered on, or made ‘hot’, when a disk drive failure is detected and made to replace the failed disk drive. Such a system selectively powers on spare disk drives when it receives an indication of failure of a disk drive. Data is reconstructed on the spare disk drive to restore the original fault tolerance of the system.
However, the spare disk drive selected to replace a failed disk drive might not be the optimum spare disk drive in terms of its effect on data bus loads, power bus loads, and environmental conditions. In addition, the failure of a drive is not intercepted and the RAID engine needs to intervene to respond to the failure. This causes an increased overhead on the RAID engine to perform the processing required to respond to the failed disk drive.