The present invention relates to fault tolerant mass storage, and in particular to mechanisms for transferring control between two controllers.
In fault tolerant mass storage systems, often two different controllers are each connected to the same group of disk drives for redundancy. Each controller has primary control over a different set of the disk drives in the group. In the event that there is a failure of one of the controllers, the other controller can take over the controller function for the set of disk drives that were previously controlled by the other controller. An example of such a system for redundant arrays of independent disks (RAID) is set forth in U.S. Pat. No. 5,140,592.
One of the challenges of a controller fail-over or other transfer is to reduce the elapsed time between the transfer and when the data on the disk drive is available to the partner controller. The partner controller must detect the fail-over, and then must detect and bring up the devices that were owned by the failing, or relinquishing controller.
Bring-up is the process of preparing a device (disk drive) for controller communication. This process consists of multiple SCSI commands such as Test Unit Ready, Inquiry, Read Capacity, Start, and Mode Sense. These commands must all be successfully completed to allow media access of the disk drives. This bring-up process can result in a significant delay (30 seconds to several minutes) before the media is available to the new controller on a fail-over recovery.
It would be desirable to reduce the amount of time for accessing the disk drives after a failure or other transfer of control.
The present invention provides a method and apparatus for reducing the bring-up time upon a transfer of control. This is accomplished by establishing two levels of device availability, physical and media level. Physical level access is performed during bring-up by all connected controllers, prior to fail-over or other transfer, on the devices. The media level access is performed only by the controller with primary control of the devices. Upon a failure or transfer by the first controller, a media level access can be immediately performed by the second controller without doing a physical level access first, since this had been done prior to the fail-over or transfer.
All controllers have at a minimum a physical level access to any one device at all times after the initial access. The physical level access allows commands used for bring-up, non-destructive diagnostic or device monitoring functions. It is at the physical level that device communication capability is validated. The exclusive access or media level access allows the use of all I/O commands. This level is only made available to the media subsystem of one controller at a time. The media level ownership of devices is governed by a continuing agreement process between the two controllers.
For a further understanding of the nature and advantages of the invention, reference should be made to the following description taken in conjunction with the accompanying drawings.