1. Field of the Invention
The present invention relates, in general, to data storage networking technology, and more particularly, to a system and method for communicatively-linking redundant data array controllers using the host bus, and more particularly, the host PCI bus, as the inter-controller-link and for controlling redundant messaging or communications at the array controllers rather than at the host.
2. Relevant Background
Modern mass storage subsystems are continuing to provide increasing storage capacities to fulfill user demands from host computer system applications. This reliance on large capacity mass storage has led to increased demands for enhanced reliability. Various storage device configurations are used to meet the demands for higher storage capacity while maintaining or enhancing reliability of the mass storage subsystems.
One solution to these mass storage demands for increased capacity and reliability is the use of multiple smaller storage modules configured in geometries that permit redundancy of stored data to assure data integrity in case of various failures. In many such redundant subsystems, recovery from many common failures can be automated within the storage subsystem itself due to the use of data redundancy, error correction codes, and the like. These subsystems are typically referred to as redundant arrays of inexpensive (or independent) disks (or more commonly by the acronym RAID). There are five “levels” of standard geometries defined for RAID. The simplest array, a RAID level 1 system, comprises one or more disks for storing data and an equal number of additional “mirror” disks for storing copies of the information written to the data disks. The remaining RAID levels, identified as RAID level 2, 3, 4 and 5 systems, segment the data into portions for storage across several data disks. One or more additional disks are utilized to store error check or parity information.
RAID storage subsystems typically utilize a control module or array controller that at least partially shields the user or host system or server from the details of managing the redundant array. The array controller makes the subsystem appear to the host computer as a set of highly reliable, high capacity disk drives independent of the physical drive size and characteristics. In fact, the array controller may distribute the host-supplied data across a plurality of the small independent drives with redundancy and error checking information so as to improve subsystem reliability. Frequently RAID subsystems provide large cache memory structures to further improve the performance of the RAID subsystem. The cache memory is associated with the control array such that the storage blocks on the disk array are mapped to blocks in the cache. This mapping is also transparent to the host. The host system simply requests blocks of data to be read or written and the RAID controller manipulates the disk array and cache memory as required.
To further improve reliability, redundant array controllers are sometimes provided to reduce the failure rate of the subsystem due to control electronics failures. In some redundant architectures, pairs of control modules are configured such that they control the same physical array of disk drives. A cache memory module is associated with each of the redundant pair of control modules. When one of the redundant pair of control modules fails, the other stands ready to assume control to carry on operations on behalf of I/O requests. Typically, one controller, often referred to as a master or the active controller, essentially processes all I/O requests for the RAID subsystem. The other redundant controller, often referred to as a slave or passive controller, is simply operable to maintain a consistent mirrored status by communicating with the active controller. The caches are mirrored on each controller, and it is desirable that writes posted to the active controller are mirrored on the standby or passive controller. In the case of dual active controller arrangements, the passive controller may manipulate data on separate logical units and this may occur even on the same drive set. For any particular RAID logical unit (a group of disk drives configured to be managed as a RAID array), there is a single active controller responsible for processing of all I/O requests directed thereto. The passive controller does not concurrently manipulate data on the same RAID logical unit.
The data storage industry continues to struggle how most efficiently to facilitate controller redundancy. A key design issue arises because the redundant control modules must communicate with one another to assure that the cache modules are synchronized and to provide proper redundancy. It is common in the art to require host intervention to coordinate failover operations among the controllers and to facilitate communications, i.e., to provide a host-driven redundancy scheme. Host interaction and control over communications may be undesirable because it reduces host processing efficiency as processing time is used to control and monitor controller communications and limits host interoperability as each array controller platform may vary significantly. Further, host interaction and control for redundancy is not readily available and is often expensive. Host involvement to maintain mirrored caches requires data movement from the host (e.g., the host processor, host memory, host North Bridge, and Peripheral Component Interconnect (PCI) bus) twice, once to each controller, and therefore, is undesirable because it reduces available bandwidth by half.
A number of arrangements currently exist for allowing the active and passive controller to communicate. In one arrangement, the host system needs to provide two special and dedicated extended bus slots (such as PCI extended slots) for host to controller communications and using a channel on the controller (such as a Small Computer System Interface (SCSI) channel) and a shared bus for the link between the controllers. The provision of extended bus slots is not common in host devices such as servers, and this arrangement requires the host to include controller control software and uses up a communication channel on each controller. Similarly, some arrangements call for a separate bus (such as a SCSI bus) or storage communications network (such as Fibre Channel (FC), Gigabit Ethernet, and the like) that provides a communication path between the host and each of the redundant controllers. Again, this requires controller command or communication mechanisms to be run by the host or in peripheral host devices and often requires additional hardware, such as host bus adapters (HBAs) to provide a link between the host bus and the communication link to the controllers. Further, the controllers still require a dedicated or shared bus and communication channel to provide inter-controller communications.
Hence, there remains a need for an improved system and method for providing array controller redundancy and communicatively linking the pair of redundant array controllers. Preferably, such a system and method would reduce processing demands on host devices, reduce the need for dedicated or specialized communication busses and communication ports, and increase interchangeability of hosts.