Redundant Array of Inexpensive Disk (RAID) systems have become the predominant form of mass storage systems in most computer systems today that are used in applications that require high performance, large amounts of storage, and/or high data availability, such as transaction processing, banking, medical applications, database servers, internet servers, mail servers, scientific computing, and a host of other applications. A RAID controller controls a group of multiple physical disk drives in such a manner as to present a single logical disk drive (or multiple logical disk drives) to a computer operating system. RAID controllers employ the techniques of data striping and data redundancy to increase performance and data availability.
One technique for providing high data availability in RAID systems is to include redundant fault-tolerant RAID controllers in the system. Providing redundant fault-tolerant RAID controllers means providing two or more controllers such that if one of the controllers fails, one of the other redundant controllers continues to perform the function of the failed controller. For example, some RAID controllers include redundant hot-pluggable field replaceable units (FRUs) such that when a controller fails, an FRU can be quickly replaced in many cases to restore the system to its original data availability level.
Redundant fault-tolerant RAID controllers communicate with one another by passing messages to one another in order to accomplish their fault-tolerant operation. Historically, the controllers have communicated via a common communication channel such as Fibre Channel or SCSI. Typically, these communication channels are also the I/O channels by which the RAID controllers communicate with the storage devices attached to them or to the host computers for which the RAID controllers provide data. Consequently, these communication channels may be subject to service interruptions if a storage device fails or if the physical channel medium is damaged or removed. Additionally, these communications channels typically incur a relatively high latency in communicating messages back and forth between the RAID controllers. Additionally, the processing of the messages consumes a significant amount of the bandwidth of the RAID controller CPUs.
Therefore what is needed is a more reliable and efficient communication channel between redundant RAID controllers.