This invention relates generally to input/output processing structure and method for computer systems having a plurality of processing resources. More particularly, the present invention pertains to cache data mirroring by a controller to an alternate controller in a data storage system that is being managed, by the controllers, in dual active configuration.
Modern computers, particularly computers operating in a server environment, require a large, fault-tolerant data storage system. Hard drives in all computer systems are susceptible to failures caused by temperature variations, head crashes, motor failure, controller failure, and changing voltage conditions. To improve reliability and protect the data in data storage systems, many data storage systems use a redundant array of independent disks (RAID) operated by a disk array controller. Conventional RAID systems typically consist of several individual disk controllers combined with a rack of drives to provide a fault-tolerant data storage system that is directly attached to a host computer. The host computer is then connected to a network of client computers to provide a large, fault-tolerant pool of storage accessible to all network clients. Typically, the disk array controller provides the brains of the data storage system, servicing all host requests, storing data to multiple drives, such as, for example, RAID drives, caching data for fast access, and handling any drive failures without interrupting host requests.
Caching data by a disk array controller into a cache memory increases the performance of data storage and retrieval operations by maintaining a collection of the most recent references made by a host computer. Cache memory can typically be operated in a write-back or write-through mode. In a write-back mode, write data is temporarily stored in the cache and written out to disk at a subsequent time. An advantage of this mode is that it increases the controller""s performance. The RAID controller notifies a host computer that the write operation succeeded (by sending the host computer a completion status) although the write data has not been stored on the disk.
It is desirable for a data storage system to reliably function with any type of failed component, including a failed disk array controller. Failure of a single disk array controller in a data storage system having a single, or multiple independent controllers, renders the tasks that were being performed by the failed controller, and/or those tasks scheduled to be performed by the failed controller, inoperable.
Worse yet, the failure of a single disk array controller in a data storage system having only one controller, renders the entire RAID system inoperable. (Hereinafter, xe2x80x9cdisk array controllerxe2x80x9d is often referred to as xe2x80x9ccontrollerxe2x80x9d to simplify the description, unless otherwise stated.) To circumvent the problem of a single point of failure that all single controller RAID systems exhibit and provide redundancy to a data storage system, dual active controllers were implemented.
A dual active controller configuration typically consists of a first controller and a second controller coupled to one another, so that in the event of a single controller failure, the surviving controller is able to take over the tasks that were being performed by the failed controller, and perform those tasks that were scheduled to be performed by the failed controller.
To take over the tasks of a failed controller, a surviving controller must keep track of both the tasks that its partner controller is working on, and the tasks that its partner controller is scheduled to work on before the failure occurs. To illustrate this, consider, for example, that a controller fails before data stored in its cache (in response to a write request from a host computer) is written onto a system drive. Data in the cache of a failed controller is lost unless a battery backup is used. In this situation, it is desirable for a surviving controller to complete the scheduled task of the failed controller by writing the data that was in the failed controller""s cache onto the system drive.
To accomplish this, a surviving controller in active configuration would need to have a copy, or a mirror of the failed controller""s cache. State-of-the-art data storage systems are limited because there are no known structure or procedures for copying or mirroring a controller""s cache between other different controllers in active configuration. Therefore, what is needed, is a cache mirroring system, apparatus, and method for multi-controller environments.