Semiconductor integrated circuit devices have maintained their reliability with the aid of testing technologies for new failure modes in line with advances in miniaturization and high-speed testing technologies which support an increased high integration. In recent years, however, due to an increase in cost for testing, it is desirable to improve the reliability of semiconductor integrated circuit devices not only by improvements in testing methods but also by improved design ideas on the system side that include the use of semiconductor integrated circuit devices.
FIG. 1 is a block diagram schematically showing a failure concealing method of a first related art.
The failure concealing method of the first related art is an example of concealing a failure by replacing a failed chip.
An information processing device shown in FIG. 1 comprises a plurality of CPUs 10P1-Pn (n is a positive integer), and is configured to cause CPUs 10P1-Pn to operate with OSs 20P1-Pn and execution environments 30P1-Pn for desired applications (AP) 40P1-Pm (m is a positive integer). The execution environment refers to software (program) which is different from the OS required to execute applications 40P1-Pm.
In the failure concealing method of the first related art, upon detection of a failure which has occurred, for example, in CPU 10P2, CPU 10P2 is replaced with normal CPU 10010, and application 40P3 is executed by this CPU 10010 under OS 20P2 and execution environment 30P2, thereby concealing the failure which has occurred in CPU 10P2 from the system software.
FIG. 2 is a block diagram schematically showing a failure concealing method of a second related art.
The failure concealing method of the second related art is an example of concealing a failure by OS which implements symmetric multiprocessing (SMP) (called “SMP OS”).
An information processing device shown in FIG. 2 comprises a plurality of CPUs 10P1-Pn, and is configured to cause CPUs 10P1-Pn to operate with SMP OS 10020 and execution environments 30P1-Pn for desired applications 40P1-Pm.
In the failure concealing method of the second related art, upon detection of a failure which has occurred, for example, in CPU 10P2, an execution queue of CPU 10P2 is masked by processing carried out by SMP OS 10020 to prevent the CPU, in which the failure has been detected, from executing an application.
FIG. 3 is a block diagram schematically showing a failure concealing method of a third related art.
An information processing device shown in FIG. 3 comprises a plurality of processing elements (only processing elements #0, #1 are shown in FIG. 3), and node switches 20000A, 20000B for shutting off processing elements #0, #1 from the system. Processing element #0, #1 comprises a memory for storing programs for executing processing, and a logical/physical ID conversion table for holding a correspondence relationship between programs and processing elements which execute the programs. The configuration shown in FIG. 3 may be represented, for example, by a multiprocessor system which is described, for example, in Japanese Patent Laid-Open No.-2-123455.
In the failure concealing method of the third related art, when processing element #1, for example, fails, a program executed by processing element #1 is transferred to processing element #0, and processing element #1 is shut off from the system by node switch 20000B.
Then, the logical/physical ID conversion tables provided in all the processing elements register the state that processing element #0 (physical ID) executes programs for processing element #1 (logical ID). Subsequently, when each processing element transmits data to processing element #1, it transfers the data to processing element #0 with reference to the logical/physical ID conversion table 20030.
Among the failure concealing methods described above, however, the failure concealing method of the first related art requires human manipulation in the concealment of a failure, so that the operation of the system must be stopped during the manipulations to conceal failure. Also, in a configuration in which a plurality of CPUs are incorporated in a semiconductor integrated circuit device, the entire chip including normal CPUs must be replaced.
On the other hand, the failure concealing method of the second related art cannot be applied to a system in which a plurality of OSs run because all CPUs run under the SMP OS.
Also, the failure concealing method of the third related art cannot be applied to a system which comprises shared resources (shared peripherals) that comprise hardware and software for implementing a timer, CPU ID, interrupt processing and the like, because respective CPUs (processing elements) comprise resources independent of one another. Further, in the failure concealing method of the third related art, when each CPU is provided with cache memory, data temporarily held by the cache memory can be lost.
As appreciated from the above, even with the aforementioned failure concealing methods of the related art applied to an information processing device which comprises a plurality of CPUs and shared resources, a plurality of OSs cannot be continuously operated if any CPU fails, unless an associated chip is replaced.