The present invention relates to a redundant peripheral device subsystem on a computer system.
Current computer systems provide for the attachment of peripheral devices. Peripheral devices are often attached to the computer system by being connected to peripheral device busses, such as known SCSI or ESDI busses. A peripheral device controller is coupled between the computer system and the peripheral device bus to form a peripheral device subsystem. The peripheral device controller controls the operation of the peripheral devices on the peripheral device bus in response to instructions from the computer system. For example, computer systems usually include mass storage subsystems for storing and retrieving data. These mass storage subsystems usually are disk drive subsystems. It is known to couple a SCSI controller to the computer system and attach one or more disk drives to a SCSI bus which operates under the control of the SCSI controller.
It is often important that data be stored to and retrieved from these disk drive systems reliably, meaning that the data be available, even if a disk drive in the subsystem fails. In order to provide reliable data storage and retrieval, various schemes involving error correction encoding the data, partitioning the codewords, and storing the partitioned codewords on a plurality of different disk drives in the disk drive subsystem have been developed. These schemes are subsumed under the term RAID (redundant array of inexpensive disks) and have provided a cost-effective means of protecting a computer system against the failure of a single disk drive.
Of the various RAID schemes, RAID 5 has been shown to be the most effective scheme to date. In the RAID 5 scheme, three or more disk drives are used in the disk drive subsystem, and the equivalent storage space of one drive is dedicated to storing odd parity based on the data stored on the other drives in the subsystem. This scheme allows reconstruction of the data, when a single disk drive fails, from the data stored on the other, still functioning, drives.
Because two or more disk drives are simultaneously involved in all data storage and retrieval operations in the RAID 5 scheme, RAID 5 subsystems have been designed which have two or more SCSI busses respectively controlled by as many SCSI controllers. The disk drives are distributed as equally as possible among the available SCSI busses. The parallel operation of multiple SCSI controllers improves the performance of such a subsystem. While the disk drive subsystem is protected against the failure of a single disk drive, it is possible for a SCSI controller to fail too. If a SCSI controller fails, the disk drives coupled to that controller are made unavailable, and the disk drive subsystem will fail.
Several RAID 5 approaches have been proposed to deal with the possibility of a SCSI controller failure. A first approach is to ignore the chance of SCSI controller failure by assuming that a SCSI controller failure is unlikely. This approach adds no hardware, and thus no cost to the RAID 5 disk drive system. However, such an approach results in a subsystem which has a single point of failurexe2x80x94the failure of a SCSI controller.
Another RAID 5 approach is to provide a second, redundant, SCSI controller for each SCSI bus. That is, two SCSI controllers will be attached to the first SCSI bus, two SCSI controllers will be attached to the second SCSI bus, and so on. This approach allows for recovery from the failure of a single SCSI controller. However, it effectively doubles the cost of the SCSI interface by requiring two SCSI controllers for each SCSI bus instead of only one. There are two possible modes of operation of such a subsystem. In a first operating mode, both SCSI controllers cooperate to control the disk drives attached to their SCSI bus. The simultaneous control of a single SCSI bus by two SCSI controllers is a complex task. Software for controlling the simultaneous operation of both SCSI controllers is much more complex than that for a system in which a single SCSI controller controls a single SCSI bus. Such software is expensive to design and implement, and is less reliable in operation. In a second operating mode, one of the SCSI controllers remain dormant. This is a waste of its capability and cost, however.
Another RAID 5 approach is to provide a second, redundant, disk subsystem controllerxe2x80x94that is, two complete disk drive controller boards. In this approach, each controller board includes multiple SCSI controllers. The first SCSI controller on each controller board is coupled in common to the first SCSI bus, the second SCSI controller on each controller board is coupled in common to the second SCSI bus, and so on. Each controller board must further include circuitry for establishing a communications path with the other controller board. Such an approach allows for a very high level of protection because everything in the disk drive subsystem has a duplicate capable of taking over operation in the event of a failure. However, this approach is very expensive. In addition, the overhead data processing and transfer between controller boards is much greater in this approach than in other approaches. Furthermore, should one controller board fail, a portion of the failed controller board must remain sufficiently operative to communicate with the other controller board to transfer operation to the other controller board. Again, as described above, either complex control software must be developed for controlling simultaneous operation of both SCSI controllers, complicated further by the intercontroller communications which must take place in this approach, or one of the controllers must remain dormant during normal operations.
Another RAID 5 approach is to provide a separate SCSI controller for each of the disk drives in the disk drive subsystem, all coupled to the computer system. In this approach, the method of recovery from the failure of a single SCSI controller is identical to the method of recovery from the failure of a single disk drive. This approach also provides a substantial performance advantage because all disk drives may be controlled simultaneously in normal operations, and all of the SCSI controllers are involved in normal operations. However, this approach is expensive because of the number SCSI controllers required.
A peripheral device subsystem is desirable which provides for recovery from the failure of a single peripheral device controller, which does not substantially increase the cost or operational complexity of the peripheral device subsystem, and which does not include expensive elements which remain dormant during normal operations.
In accordance with principles of the present invention, a redundant peripheral device subsystem in a computer system comprises first and second peripheral device controllers. First and second peripheral device busses are coupled to the first and second peripheral device controllers, respectively. A controllable switch is coupled between the first and second peripheral device busses. The controllable switch either isolates the first and second peripheral device busses from each other, or joins them into a single peripheral device bus.
A peripheral device subsystem according to the present invention allows for recovery from the failure of a single peripheral device controller by joining the two peripheral device busses into a single bus which may then be controlled by the remaining operative peripheral device controller. During normal operations, both peripheral device controllers are operating each controlling it""s own peripheral device bus. The only added hardware is the controllable switch, which may be constructed inexpensively, as will be described in more detail below. A SCSI RAID 5 mass storage system according to the present invention provides for reliable recovery of the stored data in the event of the failure of a SCSI controller.