Highly available computing is the process of designing a computer network so that system operations can continue to operate even with the malfunction or other unexpected interruption to a component of a computing network. Such systems are utilized in situations that demand a high degree of reliability. The goal of highly available networks is to provide duplicate network components to reduce the risk of a single point of failure. In the event of a component failure, duplicate or backup components can take over the role of a failed component, where a component is a general term which can include devices such as a networking switch, a storage device, a computer, or any additional device that may connect to computer network. There are a variety of possible configurations for highly available systems and typically the more effective configurations provide for duplication of components.
One example of a prior system 100 is shown in FIG. 1. This system includes two distinct computer systems, computer 102 and computer 104, each with an independent CPU, RAM, I/O bus(es) for user input devices (e.g. keyboard and mouse), and I/O bus(es) for external system devices (e.g. network switches, printers, storage devices, etc.). The system 100 includes an external switch 106 that connects input/output (I/O) ports of each computer system to two separate docking bases, 112 and 114. Each docking base in turn connects to external removable storage modules 108 and 110. These storage modules are typically individual hard disk drives.
The actual implementation of these systems can be accomplished in a number of different ways. For example, some systems are configured in a system rack device which holds a number of separate boxes, and each box could correspond to a different computer or associated computer device such as storage systems or networking components. Each computer would be able to operate independently and would have its own CPU, power supply, RAM, etc. Storage devices in the rack could take different forms. One such device is a storage module chassis which is designed to hold removable storage disk drives. In environments where security is of concern removable storage modules can be provided where the storage modules are designed to be easily removed and inserted into receptacles of the storage module chassis. Issued U.S. Pat. No. 5,126,890, and issued U.S. Pat. No. 5,280,398 discuss different aspects of removable disk drive storage modules, and each of these references is incorporated herein by reference in its entirety. Both of these patents are assigned to the same assignee as the present patent application.
In addition to the above patents describing aspects of removable disk drives, U.S. Pat. No. 5,552,776 also discusses aspects of removable disk drives, and also describes systems and methods related to providing for security by controlling access between different computers and storage modules. The U.S. Pat. No. 5,552,776 is also assigned to the assignee of the present patent application, and is incorporated herein by reference in its entirety.
System 100 of FIG. 1 contains removable storage modules 108 and 110. The system 100 allows for each removable storage module to be inserted into a receptacle in a storage module chassis. Each receptacle in the storage module chassis provides a docking base with a connector for receiving a connector from the removable storage module. For example storage module 108 is shown as being coupled with docking base 112 of a storage module chassis, and storage module 110 is coupled with docking base 114 of a storage module chassis. The I/O channel of the storage module 108 is coupled through the docking base 112 to a switch which is external to the docking base and the storage module. The I/O channel of the storage module 110 is coupled through the docking base 114 to the switch 106. The switch 106 is controlled to provide computers 102 and 104 access to the different storage modules.
This approach of providing an external switch 106 adds potential compatibility and interoperability problems. Such issues increase the complexity and cost of the system, reduce the reliability or uptime of the system, and introduced control issues. For example, if a particular storage module is not working with a computer, then the failure could be in either the storage module or the switch. Further, the central switch acts as a single point of failure. If the switch fails it is likely that the computers may not have access to any of the storage modules.
The above described system is just one example of creating a highly available system utilizing an external switch. Another example of a prior system, is one that utilizes disk storage modules which have two I/O channels and two I/O ports. An existing such storage system is the Fiber Channel (FC) interface drive provided by Seagate Technology LLC. In systems where these drives are configured for RAID operation a single or dual redundant RAID controller can access either port of the drives by means of a hub or a switch inserted in the FC loop between the drives and the controllers. With a hub, there is no switching as all drive ports are seen by the controllers. It is up to the RAID controller programming to arbitrate which computer owns each drive. If a drive port goes bad the RAID controller and the computer can continue using the other port. In a dual RAID controller mode, if one of the RAID controllers fails, the other controller can take ownership the drives. The hub or switch provides the connectivity and the RAID controllers provide the switching, redundancy and failover intelligence. Hubs or switches on the computer channels are required to do failover transparent to the host.
At the network level, between the RAID box and the host computers, FC switches can provide switching and multi-path redundancy. There is usually some storage area network (SAN) control mechanism that involves firmware or software on the host computer, the switches and the RAID controllers. On the RAID controller there is a method called SAN masking, which controls which host computers can have assess to each RAID set. Switches can be zoned to partition traffic and control access. SANs can be very complicated and often have interoperability problems between all of the pieces of the systems. At the host level, multipath software can reroute traffic through a redundant connection to the Raid box.