1. Field of the Invention
This invention relates to data storage subsystems, and more particularly, to a software method operable within controllers of the data storage subsystem for "hot-swap" detection and processing of disk drive units without special purpose circuits to indicate a disk drive insertion or removal event.
2. Discussion of Related Art
Modern mass storage systems are growing to provide increasing storage capacities to fulfill increasing user demands from host computer system applications. Due to this critical reliance on large capacity mass storage, demands for enhanced reliability are also high. A popular solution to the need for increased reliability is redundancy of component level subsystems. Redundancy is typically applied at many or all levels of the components involved in the total subsystem operation. For example in storage subsystems, redundant host systems may each be connected via redundant I/O paths to each of redundant storage controllers which in turn each may be connected through redundant I/O paths to redundant storage devices (e.g., disk drives).
In managing redundant storage devices such as disk drives it is common to utilize Redundant Array of Independent Disks (commonly referred to as RAID) storage management techniques. RAID techniques generally distribute data over a plurality of smaller disk drives. RAID controllers within a RAID storage subsystem hide this data distribution from the attached host systems such that the collection of storage (often referred to as a logical unit or LUN) appears to the host as a single large storage device.
To enhance (restore) the reliability of the subsystem having data distributed over a plurality of disk drives, RAID techniques generate and store in the disk drives redundancy information (e.g., XOR parity corresponding to a portion of the stored data). A failure of a single disk drive in such a RAID array of disk drives will not halt operation of the RAID subsystem. The remaining available data and/or redundancy data is used to recreate the data missing due to the failure of a single disk drive.
The 1987 publication by David A. Patterson, et al., from University of California at Berkeley entitled A Case for Redundant Arrays of Inexpensive Disks (RAID), reviews the fundamental concepts of RAID technology.
RAID techniques are therefore useful to sustain storage subsystem operation despite loss of a disk drive. However, the failed disk drive must eventually be replaced to restore the highest level of reliability in the subsystem. When a disk drive in the disk array fails, replacing the failed disk drive can affect data availability. Such a physical replacement is often referred to as a swap. It is common to refer to a swap where power to the subsystem must be shut off to affect the swap as a cold swap. A warm swap is one which may not require disconnection of power, but none-the-less requires the subsystem to completely reinitialize to continue operation with the replacement drive. Data availability and user service within the storage subsystem is disrupted by cold or warm swaps of the failed disk drive, because the system is not operational. During cold swaps, the system electrical power is turned off before replacing the failed disk drive. During warm swaps, the electrical power is not turned off, however, data is unavailable because insertion of a disk drive requires the shutdown and re-initialization of the disk array data storage system.
To avoid such disruption of service, hot swaps of failed disk drives are preferred. During a hot swap of a failed disk drive the subsystem remains operational and does not require shutdown of the system. The RAID management software within the controller(s) of the subsystem compensate for the failed disk drive using the other data and/or redundancy information to provide continued data availability to the user. Some RAID management software may even substitute a pre-installed spare disk drive for the failed disk drive to restore normal operation of the LUN (e.g., a logical operation to achieve an automatic swap). The failed disk drive, however, must eventually be physically replaced, preferably via hot swap, to maintain the highest levels of security and performance within the storage subsystem.
A problem encountered by RAID systems having the hot swap feature is correct recognition of a disk drive insertion or disk drive removal event. As is common in many electronic subsystems, insertion or removal of disk drives while powered on can generate undesirable transient signals (e.g., noise or "glitches"). Such signal glitches may confuse the storage subsystems controllers and control software operable therein thereby causing erroneous or unexpected conditions within the storage subsystem. Presently known RAID system designs utilize specialized hardware and electronic circuitry to detect the disk drive insertion or disk drive removal while simultaneously filtering or otherwise processing such transient signals.
For example, present RAID systems often utilize a SCSI bus internally to interconnect the plurality of disk drives (the disk arrays) to the RAID controller(s). Such systems typically utilize additional hardware and electronic circuitry to eliminate the erroneous transient bus signals. For example, in some commercially available storage subsystems, the disk drives are mounted in a "canister" which buffers the SCSI bus interface signals from the disk drive's SCSI bus interface connections. In other words, the canister attaches to the SCSI bus and the disk drive physically mounts within the canister and connects electronically through the buffers of the canister to the SCSI bus signals. The canister includes circuits (passive and or active circuits) to buffer the signals between the disk drive's SCSI bus interface connections and the SCSI bus signal paths. These circuits help reduce or prevent occurrences of such transient (noise or glitch) signals. In addition, the canister includes circuits which automatically perform a reset of the SCSI bus in response to the detection of such transient (noise or glitch) signals generated by insertion of the canister into a hot (powered on) SCSI bus. By resetting the bus, the canister in effect notifies the higher level control functions of the RAID controller of the possibility of a drive insertion or removal by asynchronously applying a reset to the SCSI bus. The higher level control functions of the RAID controller respond to the application of a reset to the SCSI bus by polling the devices on the SCSI bus and determining which previously present devices have been removed (if any) and which new devices have been added (inserted) into the SCSI bus.
Canisters and other similar circuits for enabling hot swaps add complexity (and therefore cost) to the storage subsystem. It is therefore a problem to properly recognize hot swap disk drives without the need for complex buffer circuits between the disk drives and their interconnection communication medium (e.g., without canisters for attaching disk drives to a SCSI bus).