The relentless progression to smaller feature sizes with each semiconductor process generation has had a negative impact on the soft-error rate (SER) of memory cells, such as SRAM (Static Random Access Memory) cells. Although process scaling has shrunk the charge collection diffusion area, it has also resulted in lower operating voltages, reduced internal node capacitances, and increased device impedances. These factors have reduced the critical charge necessary to upset the state of a SRAM cells faster than the corresponding reduction in the diffusion charge collection area. In addition, process scaling has increased the amount of SRAM that can be integrated into a system-on-a-chip (SOC), and hence increased the aggregate soft-error rate.
The soft error rate is typically measured in terms of FIT's. One FIT is one failure in 1 billion (1^9) hours of operation. To achieve a mean time between failures (MBTF) of one year, this requires a FIT rate of approximately 110,000. For computing servers or critical network equipment, a typical system goal is 1 failure in 1000 years, or a goal of under 100 FIT. For these high-availability systems and for data centers with large number of computers, SRAM SER has become a major concern.
There have been multiple solutions proposed to alleviate the SRAM soft-error rate problem. Multiple vendors (e.g., ST-Microelectronics) have proposed semiconductor process changes to increase the capacitance of SRAM cell internal nodes and hence increase the critical change necessary to cause a SER. Reducing SER through chip architecture changes has been proposed as well. Christopher Weaver et al. (“Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor,” isca, p. 264, Proceedings of the 31st Annual International Symposium on Computer Architecture, Munich, Germany, 2004) propose reducing the number susceptible states to reduce the likelihood of a soft error. Other approaches that combine architectural and circuit changes have been proposed. One such example proposes designing the SRAM cell to reduce the SER susceptibility of a certain transition at the cost of increasing the susceptibility of the inverse transition. For example, one could reduce the “1” to “0” SER failure rate at the cost of increasing “0” to “1” SER failure rate. This would be combined with an asymmetric ECC code which requires fewer bits than a full symmetric ECC code.
The most common solution to the SRAM soft-error rate problem is to layer a SEC-DED (Single Error Correction-Double Error Detection) ECC over the SRAM subsystem. It is common to see a 72-bit ECC code word that contains 64-bits of data and 8 check bits. Other common implementations use a per-byte ECC system that uses 5-bits per byte or a total of 20-bits for a 4-byte word (common in ARM cores), or a 4-byte word coupled directly with 7 ECC check bits.