1. Field of the Invention
The present invention relates to a fault tolerant system and a controller, operation method, and operation program used in the fault tolerant system and, more particularly, to a management of a state that specifies a system operation for realizing a fault tolerant function in a computer system and a control using the state.
2. Description of the Related Art
Conventionally known is a fault tolerance computer system (hereinafter, referred to as “fault tolerance system”) in which all components, such as a CPU (Central Processing Unit), memory, PCI (Peripheral Component Interconnect), disk, power source, and the like, that constitute a computer hardware are multiplexed (for example, duplicated or triplicated). In such a computer system, even if a failure occurs in any of components, the system can continue operating without interruption.
In the fault tolerant system, a multiplexed plurality of CPUs (processors) executes the same operation at the same timing while constantly establishing synchronization between them (which is referred to as “lock-step synchronization”). Even if a failure occurs in one of the plurality of CPUs that execute the same operation in lock-step synchronization, other CPUs continue normal operation. That is, if any failure occurs, the fault tolerant system can continue operating without adversely affecting the operation of software such as an operating system or application software executed by the CPU.
As related arts concerning such a fault tolerant system, U.S. patent application publication No. 2002/0152418 A1 discloses an apparatus and method for executing instructions in lock-step synchronization, U.S. patent application publication No. 2002/0152419 A1 discloses an apparatus and method for accessing a mass storage device in a fault-tolerant server, and U.S. Pat. No. 5,953,742 discloses a technique of making a memory copy between a plurality of processing sets each including a processor that operates in lock-step synchronization to establish high-speed resynchronization.
However, in the abovementioned fault tolerant systems of the related arts, it has been difficult to adequately perform error processing, duplication (synchronization) processing, and resynchronization processing for realizing a fault tolerant function in accordance with the system state such as CPU operation state (agreement or disagreement between operations of CPU buses), or access permission state (agreement or disagreement between IO accesses).