System uptime is increasingly limited by integrated circuit reliability, and modern systems have very high processing requirements. Thus a solution should have maximum performance; that is, essentially no delay increase is allowed, which precludes simple solutions such as temporal hardening techniques. Also, aerospace systems have limited power envelopes because of limited ability to remove heat in vacuum, for example. Finally, the area and power overhead allowed for hardening varies by market.
IBM servers have large (but diminishing) hardening costs. IBM servers, like the HERMES processor, use dual modular redundant register files. Additionally, on IBM servers, all registers are checkpointed on the processor. These designs also use full memory error-correcting code protection and redundant execution units. These designs generally do not fall into the appropriate thermal envelope (they are liquid cooled).
Core boundary checking is conventionally used. For example, duplicate cores or entire boards are checked and voted with an application-specific integrated circuit. However, this approach is not comprehensive; moreover, it is very difficult to resynchronize the cores after an error—the standard scenario is to reset the entire system in a controlled manner.
NASA/JPL refers to soft portions of present hardened central processing units as “glass jaws.” Some examples follow:                Branch Target Buffer (BTB) addresses                    BTB branch/no-branch mispredicts are corrected by the pipeline. This often leads to the totally mistaken impression that BTBs are thus inherently hard. However, this is not true. The taken/not taken choice is checked. None of the other data is.            Addresses are frequently not protected, and an upset target address sends a predicted branch into the weeds. Basically, if the target address is modified by a soft error (single event upset) and then if the branch is predicted (correctly), the machine will branch to the erroneous address. Then, program execution picks up at the wrong place.                        Flip-Flops (FFs) selecting redundancy                    When FFs change the cache configuration, changes look like a massive upset. Basically, these FFs remove bad columns, rows, or blocks from being used. If these bits are upset, then suddenly the cache is reconfigured and potentially non-operational sections are exposed to usage.            Many vendors, including IBM and Intel, use hardened FFs for this, because of the catastrophic impact of such an error (single-event functional interrupt), similar to what happens when static random-access memory configuration bits are upset.            The lesson is that it can be very difficult to predict the effect of a processor error, particularly if the code is unknown. Ideally, finding adequate protection for everything is necessary. The hardening needs the correct weight (i.e., speed, power, and area impact) for an application. However, a single central processing unit may be used in many such applications.                        
As such, there is a need for improved radiation hardened microprocessors.