1. Field of the Invention
The present invention relates to a fault tolerant (FT) computer system. In particular, the present invention relates to a technique for controlling an I/O device in an FT computer system.
2. Description of the Related Art
A server used in essential business such as traffic control, finance, and stock is responsible for a base of a social life. Thus, high reliability and fault tolerance are required for the server. Also, in the business server of a corporation, the hosting service using the Internet, and the like, the down of the server caused by a fault may result in a severe commercial loss. In this way, the demand for the high reliability server has been increased in a wide field.
As the computer system having the high reliability, a “Fault Tolerant (FT) Computer System” is known. In the FT computer system, hardware modules such as CPU and memory of the system are duplexed or multiplexed, and the respective modules are controlled to operate in synchronization with a same clock. When a fault has occurred in a certain portion of system, i.e., one module, the fault module is logically separated from the system, and the normally operating modules continue a process. Thus, the fault tolerance is improved.
FIG. 1 is a conceptual view showing the configuration of the typical FT computer system. This FT computer system 100 has duplexed hardware modules and a fault tolerant controller (FT controller) 110 connected to the hardware modules. In FIG. 1, CPUs 120 (120a, 120b), main memory 130 (130a, 130b) and I/O devices 140 (140a, 140b) are duplexed. One CPU 120a (120b) and one main memory 130a (130b) constitute one CPU subsystem 150. In short, this FT computer system 100 is duplexed by the two CPU subsystems 150. The two CPU subsystems 150 are controlled to operate in synchronization with a same clock. Also, the duplexed I/O devices (groups) 140 constitute an 10 subsystem 160. The FT controller 110 controls the CPU subsystems 150 and the IO subsystem 160. Specifically, the FT controller 110 carries out the maintenance of the synchronous operation (two-system synchronous operation) between the two CPU subsystems 150, the detection of a fault in a module, the separation control of the fault module, and the like.
Generally, the FT computer system is divided into a portion in which the dual control is carried out in hardware, and a portion in which the dual control is carried out in software. For example, the CPU subsystem 150 of the CPU 120 and the main memory 130 is the base where the software itself is operated. Thus, the CPU subsystem 150 is required to be dually controlled in hardware. When a fault has occurred in the CPU subsystem 150a, the FT controller 110 (hardware) instantly separates the fault CPU or memory from the system. Thus, without any stop of the system, the process is continued by the remaining CPU subsystem 150b and IO subsystem 160. On the other hand, the IO subsystem 160 is dually controlled in software. For example, when a fault has occurred in an I/O device 140a, the FT controller 110 detects the fault and carries out an error report to a software program (hereinafter, to be referred to as an “I/O Device Driver”) that controls the I/O device 140a. At this time, the I/O device driver stops the use of the fault I/O device 140a and uses the duplexed different I/O device 140b instead of it. In this way, the switching between the I/O devices 140 in the IO subsystem 160 is carried out in software.
In order to carry out the switching control for the I/O devices 140 as mentioned above, the I/O device driver is required to have a function of recognizing the error report from the FT controller 110 and a function of carrying out the switching process to a substitution I/O device. That is, the I/O device driver to drive the I/O devices 140 and an operating system (OS) for collectively control the driver are required to be adaptive for the FT computer system.
Japanese Laid Open Patent Application (JP-A-Heisei, 9-16426) discloses an I/O switching technique, in an FT computer having a two-port console. This conventional technique aims to carry out monitor and maintenance by the single console without any connection switching of cables. The FT computer based on this conventional technique has two systems of console outputs, and their input/output buses are switched when a fault has occurred. The switching between their input/output buses is carried out in response to a command from OS. Thus, it is considered in this conventional technique that the dedicated OS is required to be used.
By the way, in recent years, a so-called “Open System” using an Intel-compatible CPU (“Intel” is registered trademark) is a trend in the field of the server. As the main tendency, the I/O device produced by an independent hardware vendor is installed in an open PC server system, and the I/O device driver produced by the same vendor is used to control the I/O device. However, most of such I/O device drivers are not produced under the consideration of the FT computer system. In such I/O device drivers, the switching function between the I/O devices is not installed at all. Also, the I/O devices installed typically in the open computer system such as a video adaptor (VGA: Video Graphics Adaptor) are directly accessed from OS in many cases. However, it is actually impossible to apply a modification for the fault tolerant computer system to the OS mainly used in the open computer system.
The high reliability server corresponding to the open hardware and software systems is demanded. The technique is demanded which can attain the fault tolerant computer system in accordance with the open OS or I/O device driver. In particular, the technique that can carry out the dual control for the I/O devices is desirable in order to improve the fault tolerance and reliability in the open server system.
In conjunction with the above description, a portable computer is disclosed in Japanese Laid Open Patent Application (JP-A-Heisei 5-94277). The portable computer of this conventional example is provided with a display unit composed of a monochrome panel and a color panel, a monochrome panel display control circuit which controls display of the monochrome panel, and a color panel display control circuit which controls display of the color panel. A setting section sets a selection data to a switching section, which switches the monochrome panel display control circuit and the color panel display control circuit based on the selection data.
Also, a degrade system of a cluster connection multi-processor system is disclosed in Japanese Laid open Patent Application (JP-A-Heisei 11-149457). The multi-processor system of this conventional example is provided with a plurality of CPUs, a plurality of CPU control sections to control the plurality of CPUs, and a memory and an I/O control section which are shared by the plurality of CPUs. The plurality of CPUs and the plurality of CPU control sections are connected a cluster bus, and the plurality of CPU control sections are connected by a system bus. The CPU control section at least contains a control register (as a freeze register) to control disconnection of the CPU from the cluster bus and a control register (as a “CPU status register”) to indicate a connection situation of the CPU and the cluster bus. When each of the CPUs on the cluster bus starts an operation, a flag is written in the CPU status register corresponding to the CPU to indicate a cluster connection. Then, an initial diagnosis of the CPUs is started, when a fault is detected in one CPU, the fact is written in the frieze register. The fault CPU is logically disconnected from the cluster bus. The CPU control section never responds to a request from the fault CPU absolutely, and controls to separate the fault CPU from the system.
Also, a switching unit of a multiplexing apparatus is disclosed in Japanese Laid Open Patent Application (JP-P2002-77186A). In the multiplexing apparatus of this conventional example, the switching unit is provided between a connection origin apparatus and a plurality of connection destination apparatuses which are multiplexed, to select and connect one of the connection destination apparatuses and the connection origin apparatus. In the switching unit, a storage section stores connection priority levels of the connection destination apparatuses. A first signal input/output section is connected with the connection origin apparatus. A second signal input/output section is connected with the plurality of connection destination apparatuses through communication lines and inputs and outputs data from and to a specific one of the connection destination apparatuses. A routing section connects the first and second signal input/output sections directly and indirectly. A selecting section selects one of the connection destination apparatuses which has a high connection priority level as the specific connection destination apparatus. Also, the selecting section selects one of the connection destination apparatuses which has a lower connection priority level than that of the specific connection destination apparatus, when confirming generation of a connection fault in the specific connection destination apparatus based on a monitor signal of the second input section, and selects one of the connection destination apparatuses which has a higher connection priority level than that of the specific connection destination apparatus, when confirming elimination of a connection fault in the connection destination apparatus with the higher connection priority level based on the monitor signal of the second input section.
Also, a fault tolerant system is disclosed in Japanese Laid Open Patent Application (JP-P2004-280732A). In the fault tolerant system of this conventional example, first and second north bridges and duplexed and first and second input/output bus bridges are duplexed, and an asynchronous interface is used as an interface between the first and second north bridges and the first and second input/output bus bridges. A section is provided for each of the first and second north bridges to synchronize data transmission and reception for the asynchronous interface between the first and second north bridges.