1. Technical Field
The present invention relates to systems for processing data and, more particularly, to systems for detecting soft errors during execution in a computing environment.
2. Background Art
Of increasing concern to processor designers is the phenomenon of soft errors. Soft errors are intermittent errors that occur during processor execution and are due to alpha particles or high-energy neutrons in the atmosphere striking an active area of silicon rather than to design defects or manufacturing defects. Soft errors, while leaving the physical circuit intact, alter the stored charge in memory cells and logic, producing incorrect behavior and results. Accordingly, soft errors, also called transient faults or single-event upsets (SEUs), result in an invalid state.
Two sources of soft errors are high-energy neutrons and alpha particles. High-energy neutrons are the result of collision between cosmic rays and atmospheric particles. Alpha particles originate from radioactive decay of chip materials (such as solder bumps) and packaging materials. Soft errors occur when a high-energy neutron or alpha particle strikes an active area of a silicon substrate, causing a release of charges (electron hole pairs) that alters the state of a transistor. Accordingly, an error occurs in the operation that was being performed when the transistor's state was altered. Importantly, such soft errors often go undetected and can cause dramatic errors.
The rate of occurrence of soft errors, also referred to as “soft error rate” (SER), is predicted to increase due to large-scale integration (such as “VLSI” or “ULSI”) design trends as well as semiconductor manufacturing trends. Several trends driving microprocessor performance and design include the scaling of device feature sizes and increased pipeline depths. These trends have led to reduction in feature size and voltage levels of the transistors, as well as an increase in transistor density. A particle (such as high-energy neutron or alpha particle) that strikes a transistor in a logic circuit or memory can alter the value produced by the circuit or stored in the memory. Chances that such a particle will indeed cause a soft error increase as density level increases and voltage levels decrease. While soft error detection is already a significant concern in servers, workstations, and mission critical systems, it is predicted that soft error detection will become increasingly important in processor (including desktop computers) and networking component designs as silicon geometries shrink and as the charge necessary to alter the state of transistors continues to diminish.
Though the cause of soft errors is not easily preventable, inasmuch as the particles that cause soft errors are extremely difficult to block, many processors already incorporate mechanisms for detecting soft errors. Typically, however, these mechanisms are focused on protecting memory elements such as system memory and caches. These mechanisms include error-correcting codes (ECC) and parity techniques. In contrast, detection of soft errors in combinational logic elements typically involves, in most known systems, relatively expensive redundant-hardware schemes. A drawback to this approach is that it is often not cost-effective to provide full hardware redundancy to detect soft errors in combinational logic, due to the significant silicon cost of redundant hardware.
Embodiments of the reduced-hardware soft error detection apparatus and method disclosed herein address these and other problems related to soft errors.