Memory device failures result in system down time and/or incorrect results. In response, various error detection and/or error correction mechanisms have been developed. Basic mechanisms involve parity checking. More advanced techniques provide for error correction. For example, a memory module may include multiple 8-bit memory integrated circuit (IC) packages to store data and an additional 8-bit IC memory package to store error correcting code (ECC) bits corresponding to data stored in the other ICs.
The most common computer memory uses either 4-bit or 8-bit wide dynamic random access memory (DRAM) chips, soldered onto a single sided printed circuit board (PCB) in line memory module called a single inline memory module (SIMM), or dual printed circuit board in line memory module, or dual inline memory module (DIMM). These SIMMs and DIMMs are plugged into sockets on a computer to construct the memory subsystem.
It is not uncommon for a server class computer to have from 32 to 128 memory DIMMs with from 288 to 9,216 DRAMs, and a fault in this large number of integrated circuits and connection sockets running at high signaling frequencies can create errors that can either silently corrupt important data, or force the application to terminate when uncorrectable faults are detected. Error classes can be divided into permanent, persistent and transient errors. Transient errors are further divided into event errors and margin errors. Radiation-induced DRAM soft errors are a form of event errors. Signaling induced faults are a form of margin errors.
Various strategies have been developed to address errors. For example, redundant memory may be added to support error detection and correction coding. In the computer industry DRAM memory modules with 64 data bits and 8 redundant correction bits have become the high volume standard.
One effective ECC mechanism is referred to as x4 Single Data Device Correction (SDDC) that is designed to recover from a single DRAM chip failure for 4-bit memory devices. Similarly x8 SDDC is designed to recover from a DRAM chip failure for 8-bit memory devices. Current memory SDDC generally requires 18 or 36 DRAM chips in order to provide sufficient redundancy for a fully deterministic logic gate solution. The memory data and check bits are inputs to a combinatorial logic block that generates a DRAM error locator vector to identify the failure. The combinatorial logic block may also output a bit correction vector that can be used to correct (e.g., flip-bit) data in the faulty DRAM chip.
Performance per watt modeling has proven that transferring 64 Byte memory blocks for each read and write operation provides optimal efficiency. To deliver 64 Bytes some SDDC error codes send one identical address to two x4 DIMMs in a lockstep channel delivering 144 information bits per I/O clock. This arrangement of thirty-two x4 data DRAMs plus four x4 correction bit DRAMs provides enough redundancy for x4 SDDC while delivering a 64 Byte memory block in 4 I/O clocks. Some SDDC error codes send a unique address to each independent channel x4 DIMM delivering 72 information bits per I/O clock. This arrangement of sixteen x4 data DRAMs plus two x4 correction bit DRAMs provides enough redundancy for x4 SDDC while delivering a 64 Byte memory block in a more power efficient 8 I/O clocks.
Error correction theory has proven that it is not possible to build fully deterministic hardware for perfect x8 SDDC on independent channel x8 ECC DIMMs, despite the power efficiency of enabling just nine x8 DRAMs for 8 I/O bursts to deliver one 64 Byte memory block.