1. Field of the Invention
The present invention relates to a computer and an error recovery method for the same.
2. Description of the Related Art
In recent years, with advances in semiconductor technology, the speed of computers has been increasing. These advances in semiconductor technology, however, have entailed shrinking device sizes such as the size of MOS transistors used for computation and storage, and as a result, the resistance to radioactivity and internal or external noise has been decreasing, thus increasing the probability of error.
Various error correction schemes are known in the art that correct errors as they occur. For example, for memories which are more prone to error than other parts, error correction codes (ECCs) are used in order to guarantee stable operation. However, for parts other than memories, ECC codes are not often used because of their cost and performance constraints.
For the correction of errors that occur in parts other than memories, mainframe computers, of which high reliability is demanded, have traditionally employed a method such that when the execution of an instruction is completed, if there is no error, the next instruction is executed, but if an error is detected, the previous execution is re-executed. This method has had the problem that a high degree of instruction execution parallelism cannot be achieved, since the next instruction cannot be executed until after error checking has been done on the previous instruction.
On the other hand, in computers that perform instruction execution with a higher degree of parallelism, the state during operation is stored in a storage device and, if an error is detected, the previous state is restored from the storage device and the instruction execution is retried by returning to that state. This method has had the problem of increased cost because a storage device for storing the state during operation, which is not necessary for instruction execution itself, has to be provided.
Since causes for errors are often of an intermittent (transitory) nature, such as external noise or radioactivity, in either of the above methods the retried instruction(s) will, with a high probability of success, be executed correctly without encountering an error. Accordingly, the issue here is how error correction can be carried out when an error is detected in a logic operation, without decreasing the degree of instruction execution parallelism and without increasing the amount of hardware involved.
On the other hand, in many recent high-performance computers, an instruction execution technique called speculative execution is employed. In executing instructions in a computer, the execution efficiency of conditional branch instructions greatly affects the performance of the computer. When executing a conditional branch instruction in a conventional pipelined computer, first the instruction is identified as being a conditional branch instruction, then the branch condition is computed and the outcome checked, and finally the instruction at the branch destination is fetched for execution, requiring a minimum of three steps before the next instruction can be executed and thus taking correspondingly longer time before the execution of the next instruction.
By contrast, in speculative execution, upon identifying the instruction as being a conditional branch instruction, the direction of the branch is predicted using a branch prediction mechanism, and the instruction at the predicted branch destination is fetched for execution, thus requiring two steps before the next instruction can be executed. Since recent branch prediction mechanisms are able to predict the correct branch direction with a 90% or greater accuracy, efficient execution is possible most of the time.
However, there are cases where the prediction fails, and to prepare for such cases, it is practiced to store the state of the computer in a storage device. If it turns out that the prediction was wrong, the state that existed immediately before the branching occurred is restored from the storage device and the instruction execution is retried by correcting the direction of the branch.
The present invention has been devised in view of the above situation, and it is an object of the invention to provide an economical computer, and an error correction method for the same, wherein a state storage device provided as described above to prepare for a misprediction in speculative execution is also used when an error is detected in logic operation of the computer, thereby making it possible to correct errors caused by electrical noise or radioactive radiation.
To achieve the above object, according to the present invention, there is provided a computer equipped with a misprediction recovery mechanism which performs recovery processing if, after having predicted a branch destination of a branch instruction and speculatively executed an instruction at the predicted branch destination, it turns out that the branch prediction was wrong, said computer comprising: an error detection mechanism for detecting an error in logic operation of the computer; and an instruction re-execution mechanism for correcting an error caused by an intermittent failure when an error is detected by the error detection mechanism, by restoring the computer, using the misprediction recovery mechanism, to a state that existed before the occurrence of the error, and by re-executing a sequence of instructions including the instruction where the error is detected.
According to the present invention, there is also provided an error recovery method for a computer equipped with a misprediction recovery mechanism which performs recovery processing if, after having predicted a branch destination of a branch instruction and speculatively executed an instruction at the predicted branch destination, it turns out that the branch prediction was wrong, said method comprising the steps of: detecting an error in logic operation of the computer; and correcting an error caused by an intermittent failure when an error is detected, by restoring the computer, using the misprediction recovery mechanism, to a state that existed before the occurrence of the error, and by re-executing a sequence of instructions including the instruction where the error is detected.