Compute-class processors typically provide error detection or correction for register file storage using ECC codes, leaving coverage holes for transient errors that occur in pipeline structures such as datapath registers and arithmetic logic. Register file storage ECC cannot check for pipeline errors because encoding takes place after these errors strike, meaning that valid-yet-incorrect codewords are written back to the register. Any thorough protection scheme must avoid such coverage holes, and systems that demand high levels of reliability, availability, or that operate in harsh conditions must rely on a separate mechanism to protect against these pipeline errors at great expense, typically through some form of spatial or temporal duplication.
A drawback of spatial duplication tends to be its high chip area cost or design complexity. Spatial duplication roughly doubles the amount of hardware needed for the execution pipeline, which is likely to be prohibitively costly in compute-intensive processors such as GPUs. A more area-efficient alternative to full-duplication-based error detection is to employ specialized concurrent checkers to vet operations as they execute. Such techniques have the opportunity to provide low-latency error detection with relatively little hardware, but they either suffer from limited scope (protecting only a simplified RISC pipeline) or require the significant design complexity and area and power costs of protecting each pipeline operation individually. Temporal duplication is general, user-transparent, and requires no new hardware, but it can incur high performance overheads. For example, one form of temporal duplication is to perform each instruction twice, eventually checking for agreement between the data produced by the original and shadow instructions. This approach uses explicit checking instructions (leading to program bloat), roughly doubles program register usage, and doubles the number of arithmetic operations, potentially leading to a slowdown of 2× or more.