Field
Embodiments relate to the field of integrated circuits. In particular, embodiments relate to the field of reliability management for integrated circuits.
Background Information
Reliability is an important characteristic for processors and other integrated circuits. However, during operation processors and other integrated circuits are susceptible to failures, which occur gradually over time, and which tend to limit their reliability.
FIG. 1 is a block diagram of an example of a known processor 100. The processor of this example has a first core 102-1 through a seventh core 102-7. Each of the cores is operable to execute at least one task. As shown, a first task 104-1 (e.g., a thread, application, etc.) may execute on the first core and a seventh task 104-07 may execute on the seventh core. Over time, failures 106 may occur in the cores and/or in the processor. Without limitation, the failures may be due to high-energy particles impacting the processor, as well as due to other known causes. At some point, these failures may cause the processor, or at least a portion thereof (e.g., a core), to cease to function properly.
A fixed global failure rate is commonly used as a design parameter for processors and other integrated circuits to help provide a certain level of reliability (e.g., a certain expected device lifetime). The global failure rate may quantify the rate at which failures are predicted or expected to occur in the integrated circuit (e.g., the number of failures per unit time and/or the time between failures). The global failure rate may be expressed in various metrics known in the arts, such as, for example, a failure in time (FIT) rate, a mean time between failures (MTBF), or the like. By way of example, the FIT rate may represent the number of failures that are expected per billion device-hours of operation.
The integrated circuit may initially be designed with an objective of not exceeding the fixed global failure rate. However, one potential drawback with such a fixed global failure rate is that it may tend to limit the amount of logic (e.g., number of cores) that can be included in the design of the integrated circuit. In general, the more logic the integrated circuit has, the greater the actual failure rate. Even if it is desirable (e.g., from a performance perspective) to add an additional core to the design of the integrated circuit, such an additional core may cause the fixed global failure rate to be exceeded, in which case the additional core would generally be omitted from the design. Accordingly, in conventional integrated circuits, the fixed global failure rate, at least in some instances, may tend to limit performance and/or reduce energy efficiency (e.g., it may cause the cores to operate at a higher voltage).