The invention relates generally to data storage arrays, and deals more particularly with a technique to isolate a faulty switch, storage device or expansion connector/cable in a daisy-chained configuration of switches that permit access to respective storage devices.
Computer systems may store data in external storage media such as magnetic disks or tape or even semiconductor memory. Typically, the storage device has two components—the storage medium as noted above and a disk or tape drive to physically access the storage medium. In the case of a disk drive, there is also a storage controller which instructs the disk drive where to store and access data. The storage controller may receive I/O commands from one or more host computer systems which are local or remote (via a network). It is important that the data storage arrangement be reliable, and if there is a failure of a storage medium, a disk drive, a storage controller or the communication between the storage controller and the disk drive, that the data can be recovered. There are several, well known “RAID” architectures for ensuring reliability and recovery. These architectures provide redundancy of data on the same or different disks, distribution of data across the same or different disks, parity bits on the same or different disks as the data, redundancy of controllers for each disk drive, redundancy of communication paths between the storage controllers and the disk drives, etc. Generally, the higher the RAID “level” the greater the degree of redundancy, amount of parity bits, distribution of data and parity bits, etc. This results in greater reliability and recoverability. Currently, there are six RAID levels, RAID 0–5. These are described in “A Case for Redundant Arrays of Inexpensive Disks”, Proceedings of ACM SIGMOD, June 1988 by Patterson et al.
There are different, known protocols for communication between the host and the storage controller and between the storage controller and the disk drives. A “Small Computer System Interface” (SCSI) can be used between the host and the storage controller and between the storage controller and the disk drive. However, there are limits on the number of devices on any one SCSI bus and the maximum physical length of the bus as well. Also, the SCSI interface is too slow for some high speed applications. Therefore, SCSI would not be best for a host to communicate with storage devices located on a remote network. A Fibre Channel protocol is described in the FC-PH-3 Rev-9.4, November of 1997 and FC-AL-3 Rev-1.0, September of 1999, which is hereby incorporated by reference as part of the present disclosure. Compared to SCSI, the Fibre Channel can be used over greater distances, has greater speed and allows more devices to be connected to single channel. The Fiber Channel protocol includes “Fiber Channel Arbitrated Loop” (FC-AL) connection protocol. FC-AL is a connection topology between multiple Fiber Channel devices, where the connection between the devices at any one time is arbitrated between the Fiber Channel devices.
Each storage controller can be connected to multiple disk drives for greater reliability/recovery as noted above, and to increase storage capacity. The connection between the storage controller and the disk drives can be made in a variety of configurations. There can be a single daisy-chain, redundant daisy-chain, single loop, redundant loop, simple parallel, redundant parallel, or other arrangement. In a daisy-chain arrangement, there is a series arrangement of switches accessible at one or both ends by the storage controller, one switch per disk drive. A communication from the storage controller is supplied to a first one of the switches. Each switch in succession passes the communication either to its respective disk drive or to the next switch in the sequence, (i.e. “bypass” mode). This depends on which disk drive the storage controller wants to access and a respective control signal for the switch. The control signal is supplied by the storage controller and/or an enclosure services interface (“ESI”) processor associated with the daisy-chain.
In a simple parallel arrangement, there is a point-to-point connection (i.e. dedicated communication line) between the storage controller and each device driver. This allows the storage controller to access each device driver without a series of intervening switches. However, a separate communication line is required between the storage controller and each device driver. In a redundant parallel arrangement, there are two or more point-to-point connections between the storage controller and each device driver. The parallel arrangements provide the most direct and fastest connection between the storage controller and each device driver, but requires additional cabling between the storage controller and each device driver.
Occasionally, there is a failure of a storage medium, a device driver, one of the switches leading to the device drivers tin a daisy chain arrangement), a communication medium between storage controller and the switches, etc. In a daisy chained arrangement, a failure of a single storage medium, device driver, switch or communication medium could jeopardize communication between the storage controller and the failed and downstream device drivers. In many cases, if the failure is traced to a specific storage medium, device driver or switch, the faulty switch can be bypassed to restore communication between the storage controller and the downstream device drivers. It is known to attach a hardware detector to each device driver to detect a failure in the device driver or its storage medium and signal the storage controller when the device driver or its storage medium fails. This identifies the source of the failure to the storage controller which then bypasses the associated switch and device driver. A problem with such a hardware detector is the added cost to the device driver. It also may not be capable of detecting certain failures in the communication medium between the switch and the storage controller.
Accordingly, an object of the present invention is to provide a system and method to isolate a failure of a device driver, storage medium, daisy chain switch or the communication medium between the switch and the storage controller, in a daisy-chained arrangement of device drivers.
Another object of the present invention is to provide a system and method of the foregoing type that is inexpensive and does not require additional hardware.