Fault-tolerant systems have been known from before in which components such as the CPU (Central Processing Unit), memory, PCI (Peripheral Component Interconnect) and the like are made redundant between subsystems in computer systems performing data processing so that if a malfunction occurs in one or more of the components, continuous operation is possible with no stoppage. The lockstep method, for example, is used in such fault-tolerant systems.
In a lockstep fault-tolerant system, the redundant subsystem components need to be in synchronous with each other and be executing the same processes. Accordingly, an FT (Fault Tolerant) controller is installed in order to achieve synchronicity among components in such fault-tolerant systems. This controller can compare the process details of redundant components and detect discrepancies in process details so that malfunctions in the system (discrepancies in processes among redundant components) are detected.
FIG. 9 is a block diagram showing the composition of a related fault-tolerant system.
For example, in a related fault-tolerant system including two subsystems as shown in the figure, FT controllers are placed between IO bridges and northbridges. In addition, each FT controller is linked to the other subsystem via cross-linking. The FT controllers compare data processed between the IO device side and the northbridge between the two subsystems, and detect system malfunctions by detecting discrepancies.
In addition, in the fault-tolerant system disclosed in Unexamined Japanese Patent Application KOKAI Publication No. 2006-178616 (hereinafter referred to as Patent Literature 1), an FT controller is positioned between the CPU and the IO devices, and in the northbridge (board controller) connecting the CPU and memory. In the fault-tolerant system of Patent Literature 1, the input/output bus of the CPU and memory goes through the northbridge, so process details between the CPU and memory can be compared between subsystems by the FT controller in the northbridge, making it possible to detect system malfunctions.
In the related fault-tolerant system shown in FIG. 9, only data processed between the IO device side and the northbridge are compared, so it is impossible to detect process malfunctions (synchronicity discrepancies) arising among other components (CPU, memory, northbridge and the like).
In addition, in the fault-tolerant system of Patent Literature 1, it is necessary to develop a complex, high-performance northbridge because the FT controller is inside the northbridge. Accordingly, system development time becomes lengthy and development costs tend to increase.
In addition, architecture directly linking CPU and memory has become more prevalent in recent years accompanying the increase in bandwidth between CPU and memory. In the fault-tolerant system disclosed in Patent Literature 1, it is necessary to link the memory and CPU via the northbridge, so it is impossible to create the fault-tolerant system disclosed in Patent Literature 1 with this kind of architecture.
The present invention is invented in view of the above circumstances and an exemplary object of the present invention is to provide a fault-tolerant system with relatively simple composition and enabling detection of malfunctions arising among various components even with architecture directly linking CPU and memory.