The present invention relates generally to a processor fault recovering method for an information processing system of a multiprocessor configuration, and more particularly to a processor fault recovering method for an information processing system which recovers a failed processor control program when a fixed fault occurs in a processor.
In recent years, the advance of computer systems enhances the utilization of computer systems in a variety of fields, and utilization forms thereof have become increasingly complicated. Thus, a system down would cause larger influences on the society, so that a high reliability is required to the computer systems.
Known as techniques for providing a high reliability required for the computer systems are a recovery technique implemented by retry or the like for intermittent faults of processors, a relief technique utilizing redundancy of components for eliminating fixed faults of processors, and so on in an information processing system having a multiprocessor configuration in which a single operating system runs on a plurality of processors.
In addition, as the prior art related to a processing execution control method, there is, for example, JP-A-2-266457 and so on. This prior art method is applied to a multiprocessor-based information processing system, when a fixed fault occurs in a processor, for creating a virtual processor with a normal processor such that the virtual processor takes over the processing so far executed by the failed processor to avoid a system down and prevent the processing under execution from being interrupted.
On the other hand, as the prior art related to a processing succession method for a computer system having a loosely coupled multiprocessor configuration, techniques described, for example, in JP-A-60-54052 and so on are known. This prior art method utilizes a shared memory through which another normal processor takes over management information from a failed processor, thereby allowing the processing to be continuously executed.
Further, as the prior art related to the relief of the processing affected by a fixed fault of a processor, techniques described, for example, in JP-A-5-108391 and so on are known. Specifically, the disclosed method is applied to a computer system having a multiprocessor configuration, wherein an instruction so far executed by a processor affected by a fixed fault is executed by another normal processor to relieve the processing which has been once interrupted by a fault, without using embedded correction codes.
Furthermore, as the prior art related to a fault tolerant computer system having multiple processors, techniques described, for example, in JP-A-2-202636-203638 and so on are known. The disclosed techniques are such that a multiprocessor configuration is employed to multiplex processing and data to achieve a fault tolerant system.
Additionally, techniques described, for example, in JP-A-4-213736 and so on are known as further prior art techniques. This prior art describes a data processing apparatus having a dual processor configuration which is composed of an active processor and a backup processor such that when the active processor fails, the backup processor resumes the processing taken over from the active processor from the reliable latest check point.
As mentioned above, a fault tolerant computer system generally relies on a processor or software redundant configuration and a mutual diagnosis on faults to enable a backup processor to continue the processing when a fault occurs or when a processor is switched to another one. The fault tolerant computer system has a plurality of processors which run the same operating system and execute the same process to improve the reliability. However, because of its redundancy, the fault tolerant computer system has an extremely complicated system configuration.