1. Technical Field
The present invention relates to microprocessors and, in particular, to microprocessors capable of operating in high-reliability modes.
2. Background Art
Soft errors arise when alpha particles or cosmic rays strike an integrated circuit and alter the charges stored on the voltage nodes of the circuit. If the charge alteration is sufficiently large, a voltage representing one logic state may be changed to a voltage representing a different logic state. For example, a voltage representing a logic true state may be altered to a voltage representing a logic false state, and any data that incorporates the logic state will be corrupted.
Soft error rates (SERs) for integrated circuits, such as microprocessors (xe2x80x9cprocessorsxe2x80x9d), increase as semiconductor process technologies scale to smaller dimensions and lower operating voltages. Smaller process dimensions allow greater device densities to be achieved on the processor die. This increases the likelihood that an alpha particle or cosmic ray will strike one of the processor""s voltage nodes. Lower operating voltages mean that smaller charge disruptions are sufficient to alter the logic state represented by the node voltages. Both trends point to higher SERs in the future. Soft errors may be corrected in a processor if they are detected before any corrupted results are used to update the processor""s architectural state.
Processors frequently employ parity-based mechanisms to detect data corruption due to soft errors. A parity bit is associated with each block of data when it is stored. The bit is set to one or zero according to whether there is an odd or even number of ones in the data block. When the data block is read out of its storage location, the number of ones in the block is compared with the parity bit. A discrepancy between the values indicates that the data block has been corrupted. Agreement between the values indicates that either no corruption has occurred or two (or four . . . ) bits have been altered. Since the latter events have very low probabilities of occurrence, parity provides a reliable indication of whether data corruption has occurred. Error correcting codes (ECCs) are parity-based mechanisms that track additional information for each data block. The additional information allows the corrupted bit(s) to be identified and corrected.
Parity/ECC mechanisms have been applied extensively to caches, memories, and similar data storage arrays. These structures have relatively high densities of data storing nodes and are susceptible to soft errors even at current device dimensions. Their localized array structures make it relatively easy to implement parity/ECC mechanisms. The remaining circuitry on a processor includes data paths, control logic, execution logic and registers (xe2x80x9cexecution corexe2x80x9d). The varied structures of these circuits and their distribution over the processor chip make it more difficult to apply parity/ECC mechanisms.
One approach to detecting soft errors in an execution core is to process instructions on duplicate execution cores and compare results determined by each on an instruction by instruction basis (xe2x80x9credundant executionxe2x80x9d). For example, one computer system includes two separate processors that may be booted to run in either a symmetric multi-processing (xe2x80x9cSMPxe2x80x9d) mode or a Functional Redundant Check unit (xe2x80x9cFRCxe2x80x9d) mode. In SMP mode, instruction execution is distributed between the processors to provide higher overall performance than single processor systems. In FRC mode, the processors execute identical code segments and compare their results on an instruction by instruction basis to determine whether an error has occurred. The operating mode can only be switched between SMP and FRC modes by resetting the computer system.
The dual processor approach is costly (in terms of silicon). In addition, the inter-processor signaling through which results are compared is too slow to detect corrupted data before it updates the processors"" architectural states. Consequently, this approach is not suitable for correcting detected soft errors.
Another computer system provides execution redundancy using dual execution cores on a single processor chip. This approach eliminates the need for inter-processor signaling, and detected soft errors can usually be corrected. However, the execution resources are dedicated to operate in FRC mode, and though the dual core approach consumes less silicon than the dual processor approach, it still requires relatively large processor chip.
The present invention addresses these and other deficiencies of available high reliability computer systems.
The present invention provides a processor in which clustered execution resources may be switched dynamically between operating in a high reliability mode and a high performance mode.
In accordance with the present invention, the execution resources of a processor are organized into first and second execution clusters. An issue module provides instructions to the first and second execution clusters according to the execution mode of the processor. When the processor is in a high performance (HP) execution mode, the issue module provides different instructions to the first and second execution clusters. When the processor is in a high reliability (HR) execution mode, the issue module provides identical instructions to the first and second execution clusters.
For one embodiment of the invention, the processor includes a check unit that is activated in HR mode and deactivated in HP mode. The check unit compares the execution results generated by the first and second execution clusters when it is activated, and signals an error when the execution results do not match. The processor may switch between HP and HR modes under software control or in response to the occurrence of selected events.