Some parallel computing devices include node architectures based upon System-On-a-Chip (SOC) technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Each ASIC node includes a plurality of processors, which may be used individually or simultaneously, to work on any combination of computations or communications as required by the particular algorithm being solved or executed at any point in time.
It is also generally desirable that each processor be reliable, durable and operational for a long period of time. Often however, extensive processor utilization rapidly ages its components, such as circuits and transistors, causing the systems in which these processors are used to slow down, experience some reduced performance or even result in failures of transistors in propagating their signals. This aging process may be attributed to factors such as Negatively Biased Temperature Instability (NBTI), Hot Carrier Injection Degradation (HCI) and Electro-Migration effects (EM).
Additionally, sensors based on ring oscillators may be used to measure the effects of processor aging. This type of measuring is done by placing the sensors in close proximity of processor components that are exposed to workload stress. Often however, this technique may not accurately predict processor aging because the sensors are unable to properly account for differences between themselves and the workload environment of the measured components. Moreover, the sensors' inherent mode of operation, wherein the sensors often operate in a way such that they age faster than the components they measure, does not allow for accurate prediction of aging because the sensors do not always have the same aging characteristics as the measured components.
Furthermore, the process of accurately predicting aging may be complicated by other factors such as operating conditions, workload variability and process variation. For example, some operating conditions that may affect the processor's mode of operation may include ambient temperatures, supply voltage ranges and operating frequency.
Also, workload characteristics, such as switching and clock-gating factors, may affect the ability to accurately predict aging. Switching factors, generally, describe how often a signal switches per clock cycle and thus may represent the probability that data bit may switch from ‘0’ to ‘1’, or ‘1’ to ‘0’. Clock-gating factors, generally, represent the probability of how frequent the processor's clock is cut off in order to save power thereby inhibiting data-bit switch.
Additionally, process variation, which is often used to describe the variation in threshold voltages (VT) of transistors comprising the processor, may also affect the ability to accurately predict aging. For example, aging of processor's components, such as transistors, may be associated with a shift towards increasing VT, which leads to a reduction in some drain current in a way that the drain current can no longer change the processor's signals fast enough to meet the clock cycle-time requirement. Thus, when the VT shift reaches a level whereby the transistor cannot perform its function in the designated clock cycle-time, the transistor, and eventually the processor itself, may fail. Consequently, processor aging is a prediction, on a continuous timescale, of a number of correct operations until the first of such failures occurs. As a result, the processor aging may be calculated by predicting how much VT shift is sufficient to cause this failure. However, even this method is not adequately precise since often, due to imperfection in manufacturing processes, VT and aging rates may be different for the same transistor of the same design within the same processor. Therefore, processor pre-characterization, using transistor design-time simulations, is insufficient, at least, due to its inability to capture process variation as well as the architecture-level characteristics, such as redundancies and workload stress/sensitivities.
Accordingly, it would be desirable to accurately predict the processor's operational lifetime by assessing the aging characteristics at the architecture level in an environment where process variation exists.