Permanent failures in a memory due to effects such as aging, and transient failures or soft errors in on-chip memory during application execution due to alpha particle incidence, process marginalities, etc., call for deployment of error detection and correction technologies in system-on-chips (SoCs). Field reliability of memories is a concern for mission critical applications, e.g. storage, automotive, medical, etc. So, online detection and correction of faults in memories are important. Error correction of memory can make the application robust to such marginality fails, as well as radiation induced fails, thereby also extending use life of the affected devices.
Conventional error correction codes can be used to encode the data in the memory; however, they are associated with the following limitations and problems:
(i) Implementations using Error Checking and Correction (ECC) schemes based on Hamming codes are associated with a higher timing overhead, which slows down the operating frequency of the system in which the memory is used. While techniques such as pipelining help in reducing timing overhead at the cost of throughput, the CPU (processor) in many existing SoCs is unaware of the pipeline stages and has to defer any memory transaction until the existing transaction is completely cleared.(ii) Partial writes (e.g. byte write to a memory of higher data width) and other types of accesses to memory are handled inefficiently with an impact on throughput.
Another type of problem called Problem 2 herein involves crosstalk on long buses. In DSM (deep sub-micron) technologies, the interconnect metal sub-system covering long bus routes causes additional problems such as crosstalk. It is believed that this effect is ignored in conventional testing methods using ATPG (automated test pattern generation) or BIST (built-in self-test) techniques. While bus activity is a composite model involving multiple signals and their transitions, it is believed the conventional techniques only focus on fault models at the gate and cell levels.
It would be desirable to address the above mentioned problems and issues, among others.