A soft error event in an integrated circuit is an unexpected change of a logical state of a storage element such as a flip flop, latch, shift register, or memory bit cell. Soft errors involve changes to data and are not indicative of a problem with the physical structure of the storage element itself. That is, if the desired data value is rewritten to the storage element after a soft error, the storage element will store the appropriate value.
Soft errors became widely known with the introduction of dynamic random-access memory (DRAM) in the 1970s. In these early memory devices, packaging materials contained small amounts of radioactive contaminants. Under some circumstances, radioactive emission from the contaminants in a housing or package caused a soft error in a semiconductor memory device contained within the housing. Package radioactive decay usually caused a soft error by alpha particle emission. The positively charged alpha particle traveled through the semiconductor to disturb the distribution of electrons in circuit elements. If the disturbance was large enough, a digital signal could change from a logic 0 value to a logic 1 value or vice versa. In combinational logic, this effect was transient, perhaps lasting a fraction of a nanosecond, and this transience has led to the challenge of soft errors in combinational logic mostly going unnoticed. In sequential logic such as latches and flops, and also in memory, this transient upset could become stored for an indefinite time, to be read out later. Thus, circuit designers were usually much more aware of the problem in storage devices.
Once the electronics industry had determined how to control package contaminants, it became clear that other causes were involved. It has been demonstrated that cosmic rays also cause soft errors. Although the primary particle of a cosmic ray does not generally reach the Earth's surface, it creates a shower of energetic secondary particles. At the Earth's surface approximately 95% of the particles capable of causing soft errors are energetic neutrons. This flux of energetic neutrons, which is actually a byproduct of cosmic rays, is typically referenced as “cosmic rays” in the soft error literature. Neutrons are uncharged and cannot disturb a circuit on their own, but undergo neutron capture by the nucleus of an atom in an integrated circuit. This process may result in the production of charged particles, such as alpha particles and other fragments, which can then cause soft errors.
Whether a storage element experiences a soft error depends on the energy of the incident particle, the geometry of the impact, the location of the strike, and the design of the storage element in its path. Storage element designs having higher capacitance and higher voltage differences between semiconductor junctions are less likely to suffer an error. However, pressures to increase storage capacity and data transfer rates lead to a decrease in the size of integrated circuit based storage elements and their operating voltages. These relatively smaller and faster storage elements present two problems with regard to incident ionizing radiation: 1) they are relatively more sensitive to the deposited charge, and 2) a given ionizing event is likely to affect more storage elements since more elements will intercept the path of the ionizing radiation. Consequently, the soft error rate (SER) becomes increasingly important as technology advances and feature size decreases. The SER is the rate at which a device or system encounters or is predicted to encounter soft errors. It is typically expressed as either a number of failures-in-time (FIT), or mean time between failures (MTBF).
Traditionally, SER characterization of storage element designs has been performed on integrated circuits constructed specifically for test purposes. Such dedicated SER test circuits are inherently expensive and wasteful to design and construct and ultimately only emulate what might occur on an actual application specific integrated circuit.
As integrated circuits (ICs), especially application-specific ICs (ASICs), have become increasingly complex, testing their functionality to ensure that they operate properly has become increasingly challenging. IC testing involves two general categories: functional testing and structural testing. Functional testing involves stimulating the primary inputs of the IC and measuring the results at the primary outputs of the integrated circuit. Functional testing exercises the functionality of logic elements within the IC and is a time-honored method of testing that the IC can perform its intended operations. However, creating a robust functional test for a complex IC is labor intensive, and attendant test equipment can be uneconomical.
To economize effort and cost involved in IC testing, structural testing has emerged as an alternative to functional testing. In a structural test, the internal storage elements of the IC are used to control and observe the IC internal logic. A structural test is generally done by linking the storage elements into a serial shift register or “scan chain” when a test mode signal is applied. This technique is commonly referenced as “scan testing.” Generally, an IC having scan testing capability includes a number of scan chains, each comprising a number of interconnected multiplexers and registers connected to the functional logic of the integrated circuit. The registers in a scan chain are typically implemented using D flip-flops. A scan chain can be many hundreds of thousands of flip-flops in length, and is generally divided into a smaller number of shorter scan chains, each typically comprising on the order of one hundred to one thousand flip-flops and multiplexers.
During scan testing, scan data that is provided to the IC at an input/output (I/O) pad is serially clocked into, i.e., loaded into, the scan chain registers. After the scan data is loaded, a primary input state is applied to the combinational logic of the integrated circuit. The combination of the scanned-in present state and the applied primary inputs comprises the test stimulus. The values of the primary outputs are then measured and a single clock cycle is executed to capture the response of the circuit to the stimulus. To complete the scan test, the values captured in the registers are then serially scanned out of the scan chain to an I/O pad. Scan chains can be scanned out serially, i.e., one after another. Alternatively, multiple scan chains can be scanned out in parallel.
Another type of testing is known as boundary scan. Boundary scan is a method for testing interconnects between devices on printed circuit boards or between sub-blocks inside an IC. Boundary scan testing has been standardized by the Joint Test Action Group (JTAG) as IEEE Standard 1149.1. For purposes of board-level testing, a JTAG-enabled IC includes dedicated “test cells” connected to each I/O pad of the IC that can selectively override the functionality of that pad when instructed to enter a JTAG test mode. These cells can be programmed via the JTAG scan chain to drive a signal onto a pad and across an individual trace on a circuit board. The cell at the destination of the board trace can be programmed to read the value at the pad, verifying that the board trace properly connects the two pads. In the case of performing boundary scan testing between IC sub-blocks, test cells disposed between the sub-blocks allow the sub-blocks to be controlled in the same manner as if they were physically independent circuits. Scan tests are implemented to test integrated circuit behavior, such as hard errors caused by manufacturing defects in digital logic based devices. A hard error is permanent and the defective circuit must be avoided if the digital logic is to function as intended.