1. Field of the Invention
The invention relates generally to control methods operable within a disk array subsystem (RAID) and in particular to methods operable within a disk array subsystem to simplify host computer RAID management and control software integration.
2. Background of the Invention
Modern mass storage subsystems are continuing to provide increasing storage capacities to fulfill user demands from host computer system applications. Due to this critical reliance on large capacity mass storage, demands for enhanced reliability are also high. Various storage device configurations and geometries are commonly applied to meet the demands for higher storage capacity while maintaining or enhancing reliability of the mass storage subsystems.
A popular solution to these mass storage demands for increased capacity and reliability is the use of multiple smaller storage modules configured in geometries that permit redundancy of stored data to assure data integrity in case of various failures. In many such redundant subsystems, recovery from many common failures is automated within the storage subsystem itself due to the use of data redundancy, error codes, and so-called "hot spares" (extra storage modules which may be activated to replace a failed, previously active storage module). These subsystems are typically referred to as redundant arrays of inexpensive (or independent) disks (or more commonly by the acronym RAID). The 1987 publication by David A. Patterson, et al., from University of California at Berkeley entitled A Case for Redundant Arrays of Inexpensive Disks (RAID), reviews the fundamental concepts of RAID technology.
There are five "levels" of standard geometries defined in the Patterson publication. The simplest array, a RAID level 1 system, comprises one or more disks for storing data and an equal number of additional "mirror" disks for storing copies of the information written to the data disks. The remaining RAID levels, identified as RAID level 2,3,4 and 5 systems, segment the data into portions for storage across several data disks. One of more additional disks are utilized to store error check or parity information.
RAID storage subsystems typically utilize a control module that shields the user or host system from the details of managing the redundant array. The controller makes the subsystem appear to the host computer as a single, highly reliable, high capacity disk drive. In fact, the RAID controller may distribute the host computer system supplied data across a plurality of the small independent drives with redundancy and error checking information so as to improve subsystem reliability. Frequently RAID subsystems provide large cache memory structures to further improve the performance of the RAID subsystem. The cache memory is associated with the control module such that the storage blocks on the disk array are mapped to blocks in the cache. This mapping is also transparent to the host system. The host system simply requests blocks of data to be read or written and the RAID controller manipulates the disk array and cache memory as required.
To further improve reliability, it is known in the art to provide redundant control modules to reduce the failure rate of the subsystem due to control electronics failures. In some redundant architectures, pairs of control modules are configured such that they control the same physical array of disk drives. A cache memory module is associated with each of the redundant pair of control modules. The redundant control modules communicate with one another to assure that the cache modules are synchronized. In prior designs, the redundant pair of control modules would communicate at their power-on initialization (or after a reset operation). While the redundant control modules completed their communications to assure synchronization of the cache modules, the RAID storage subsystem would be unavailable with respect to completing host computer requests. If the cache modules were found to be "out of sync" the time required to restore synchronization could be significant. In addition, a failure of one of the redundant pair of control modules would further extend the time during which the RAID storage subsystem would be unavailable. Manual (operator) intervention could be required to replace a detective redundant control module in order for the RAID subsystem to begin processing of host computer requests.
Control and administrative functions to manage the various geometries and configuration options of such RAID subsystems are often embodied in programs operable in host computer systems attached to the RAID subsystem. Such host computer programs communicate with the RAID subsystem via standard I/O functions provided by the underlying subsystem. Standard I/O read and write operations are typically used to exchange data with the storage array subsystem. I/O control functions are most frequently used to control and administer the subsystem geometries and configuration parameters. I/O control function calls provide an "out of band" communication channel to the storage array subsystem to clearly distinguish the data exchange functions from the I/O control administration functions.
Use of "standard" I/O control functions raises problems in the portability of the administrative programs operable on attached host computer systems. I/O control functions in operating systems are not well standardized. There exists significant variability between operating systems as to the features available in, and the restrictions imposed on, the I/O control functions. As applied, for example, to the administration of SCSI storage array devices, I/O control functions (e.g., ioctl) in some systems are incapable of returning SCSI sense data from the device while other systems can return such sense data. Some systems impose restrictions on I/O control function calls such as the inability to utilize the function call in a multi-threaded (multi-tasking) manner in conjunction with a single storage array device. Or, for example, other system may preclude use of I/O control functions in conjunction with other standard file oriented I/O function calls. Additionally, perhaps due in part to the lack of standardization, I/O control functions tend to be less thoroughly tested by systems vendors than other I/O related library functions.
In view of the above it is clear that a need exists for an improved method for communicating administrative information and configuration parameters to a storage array subsystem. In particular, a need exists for methods operable within a RAID storage system which serve to simplify and standardize the host based computer programs which manage attached RAID subsystems.