1. Field of the Invention
The present invention generally relates to pipelined data processing systems, and more specifically to error detection and recovery from errors during pipelined execution of data.
2. Description of the Related Art
Modern computer systems typically contain several integrated circuits (ICs), including a processor which may be used to process information in the computer system. The data processed by a processor may include computer instructions which are executed by the processor as well as data which is manipulated by the processor using the computer instructions. The computer instructions and data are typically stored in a main memory in the computer system.
Processors typically process instructions by executing each instruction in a series of small steps. In some cases, to increase the number of instructions being processed by the processor (and therefore increase the speed of the processor), the processor may be pipelined. Pipelining refers to providing separate stages in a processor where each stage performs one or more of the small steps necessary to execute an instruction. In some cases, the pipeline (in addition to other circuitry) may be placed in a portion of the processor referred to as the processor core. Some processors may have multiple processor cores.
As an example of executing instructions in a pipeline, when a first instruction is received, a first pipeline stage may process a small part of the instruction. When the first pipeline stage has finished processing the small part of the instruction, a second pipeline stage may begin processing another small part of the first instruction while the first pipeline stage receives and begins processing a small part of a second instruction. Thus, the processor may process two or more instructions at the same time.
Over the past few decades, the speed and density of transistors in integrated circuits has continued to increase in accordance with Moore's law, which predicts exponential growth. However, continuously decreasing feature sizes, reductions in supply voltages, and increased clock rates in modern processors has resulted in processors becoming increasingly susceptible to errors. For example, the possibility of errors in interprocessor communications has greatly increased, thereby necessitating error detection and recovery mechanisms.
Furthermore, as feature sizes shrink, the probability of encountering soft errors has also increased. Soft errors may be caused by external elements such as, for example, a charged particle striking a memory or memory type device and altering the contents of memory. For example, a cosmic ray may strike a register and alter the contents of the register by flipping one or more bits.
When instructions referring to vital registers altered by soft errors are executed, the execution of the instructions may result in potentially catastrophic effects. For example, executing instructions referring to registers altered by soft errors may result in an unintended effect on a computer system or may cause one or more other vital registers to be erroneously altered, thereby propagating the error and potentially resulting in system failure. Therefore, errors in registers must be detected, and propagation of errors to other registers must be avoided.
Furthermore, imprecise exceptions may occur when there are dependencies between instructions that execute out of order in different execution units. For example, out of order execution may result in the values of one or more registers being altered before another instruction is able to access the register, thereby resulting in unpredictable results and potentially catastrophic effects. This problem may be further exacerbated if the instruction altering the values of one or more registers is associated with a soft error condition, as described above. More importantly, out of order execution may result in an inability to determine the exact instruction and/or exact cycle during which an instruction failed, thereby precluding system recovery.
One solution to obviate erroneously changing register values may be to implement a recovery unit. A recovery unit may be configured to preserve the state of the contents of registers accessed in a pipeline. Therefore, if an erroneous update to a register is detected, the recovery unit may revert the system state to a previously saved non-erroneous state. For example, the recovery unit may restore the contents of a register to a previous non-erroneous value.
However, recovery units are very large and consume a significant amount of space in a processor. In some instances, a recovery unit may be as large as an execution unit, for example, a floating point unit. Such consumption of space in the processor is inefficient because, if available, the space may be used to add to the processing power of the processor.
Accordingly, there is a need for improved methods and systems for preserving the integrity of register contents and recovering from error conditions.