1. Field of the Invention
This invention relates in general to high availability computer systems, and more particularly to a mechanism that enables peripheral component interconnect (xe2x80x9cPCIxe2x80x9d) bus switching for high available computer systems.
2. Description of the Related Art
Computer systems are used increasingly for mission-critical applications that rely on the need for a high level of availability over time. As such, two or more computer systems are coupled together in order to ensure that the coupled system can sustain failures from power systems, disks, processors, network components, software, and the like. In this manner, the operation of the mission-critical application is uninterrupted in the event of such failures.
One area of concern for a high availability system is the ability to maintain I/O traffic between the system processor and the peripheral devices when a system failure occurs. A common solution to this problem is to provide one computer system with the capability to control the other computer system""s peripheral devices in the event of a failure. Typically, the peripheral devices are connected to a peripheral component interface (xe2x80x9cPCIxe2x80x9d) bus. The task then becomes one of PCI bus switching, that is allowing a non-failed computer system to take control of a failed computer system""s PCI bus.
FIG. 1 illustrates one such PCI bus switching technique. As shown, there is a loosely-coupled computer system 5 having two processor boards 10a and 10b each having a CPU 12a-b (xe2x80x9ccentral processing unitxe2x80x9d), a complex electronic complex (xe2x80x9cCECxe2x80x9d) 14a-b, a PCI controller 16a-b, and two PCI bridges 18a-d. Each processor 10a and 10b can be selected from a chipset that will communicate with a PCI bus at 33 MHz, such as an Intel Pentium processor and its associated components.
The PCI buses 20a and 20b connect the controllers 16a and 16b to their respective PCI bridges 18a-d, and PCI buses 22a and 22b cross-connect each set of I/O cards 24a-b to a primary PCI bridge 18a, 18c and a secondary bridge 18b, 18d. Each PCI bus 20a-b and 22a-b has a fixed bandwidth of 133 MB/s. The secondary bridges 18b and 18d are not operational unless a primary bridge 18a, 18c or a processor 10a-b fails. In order for the PCI bridges 18a-d to allow implementation of the PCI bus switching, it must have the necessary hardware and software mechanisms to relinquish and acquire the PCI bus 22a-b at the appropriate times. In particular it must be able to manage the bus access arbitration, clock distribution, and reset logic.
There are several limitations with the architecture of the computer system shown in FIG. 1 that affects its PCI bus switching technique. For example, with the above system, the PCI bus 20a-b can not span more than 2 or 3 inches between the controllers 16a-b and the PCI bridges 18a-d because of PCI operational parameters. Consequently, this system can only be implemented in a single casing structure having room for closely spaced processor boards 10a-b, i.e., coupling multiple processor casing structures together to cross-connect their respective PCI buses would not be effective.
In addition, the use of a single conventional PCI bus 20a-b between the bridges 18a-d and the controllers 16a-b of each processor 10a, 10b delays the I/O traffic and reduces throughput on the PCI bus when a secondary bridge 18b, 18d has to take control of a failed primary bridge 18a, 18c or a processor 10a, 10b. For example, when one CPU 12b fails and the other CPU 12a acquires that failed CPU""s PCI bus 22b, the data formally transferred over two PCI buses 20a-b to their respective CPU""s is now shared on a single bus 20a. Consequently, a single bus operating with a fixed bandwidth of 133 MB/s is forced to mange twice as much data with half of the original 266 MB/s bandwidth available.
Accordingly, there is a need for a PCI bus switching technique that can overcome the aforementioned limitations.
In one aspect of the present invention, a computer system is provided wherein a first processor managing a first PCI bus acquires control of a second PCI bus coupled to a second processor. Each PCI bus couples to a set of I/O devices. A first and a second master hot swap controller (xe2x80x9cmHSCxe2x80x9d) respectively couples to the first and second processor. A first, second, third, and fourth PCI controller respectively couples between the first PCI bus and the first mHSC, the second PCI bus and the second mHSC, the first mHSC and the second PCI bus, and the second mHSC and the first PCI bus. A first, second, third, and fourth data link respectively couples between the first PCI controller and the first processor, the second PCI controller and the first processor, the third PCI controller and the second processor, and the fourth PCI controller and the second processor. A first, second, third, and fourth slave hot swap controller (xe2x80x9csHSCxe2x80x9d) respectively couples between the first PCI controller and the first mHSC, the second PCI controller and the first mHSC, the third PCI controller and the second mHSC, and the fourth PCI controller and the second mHSC.
In another aspect of the present invention, a method is provided for implementing the above system. In particular, a method for a first processor that controls I/O traffic of a first PCI bus to acquire and relinquish control of a second PCI bus when a second processor for doing the same becomes inoperative and operative, respectively. Each processor includes at least one primary and back-up I/O controller. The process comprising the steps of: recognizing an inoperative state of the second processor to control a second PCI bus; deactivating the primary I/O controller of the inoperative second processor; activating the back-up I/O controller of the first processor; and managing the second PCI bus with the active back-up I/O controller of the first processor while the primary I/O controller of the first processor continues to manage the first PCI bus.
The above embodiments provide a computer system for effectively and efficiently switching a PCI bus from a failed or shut down processor to another processor already managing its own PCI bus(es). In particular, this invention provides a means to have a direct I/O traffic connection between each PCI bus and their controlling processor. This connection will not only help I/O traffic reduction due to throughput and speed issues related to conventional systems used for PCI bus switching, but it will also allow the PCI controllers up to 48 inches away from their controlling processor if desired and provide effective sharing between each system.