The present disclosure relates generally to computer memory and more particularly to error-correcting code distribution in a memory system.
Computer systems often require a considerable amount of high speed random access memory (RAM) to hold information, such as data and programs, temporarily when powered and operational. This information is normally binary, composed of patterns of 1's and 0's known as bits of data. The bits of data are often grouped and organized at a higher level. A byte, for example, is typically composed of eight bits; more generally these groups or bytes are called symbols and may be made up of any number of bits or sub-symbols.
Memory device densities have continued to grow as computer systems have become more powerful. In some cases, the RAM content of a single computer can be composed of hundreds of trillions of bits. Unfortunately, the failure of just a portion of a single RAM device can cause system-wide issues. When memory errors occur, which may be “hard” (repeating) or “soft” (one-time or intermittent) failures, these failures may occur as single cell, multi-bit, full chip or full memory module failures and all or part of the system RAM may be unusable until it is repaired. Repair turn-around-times can be hours or even days, which can have a substantial impact to a business dependent on the computer systems. In systems with an array of memory modules (servers, for example), failed memory modules may be isolated temporarily without taking the system down, in order to sustain the system operation. However, this would result in memory loss from the overall system memory and would adversely impact performance.
The probability of encountering a RAM failure during normal operation has continued to increase as the amount of memory storage in contemporary computers continues to grow. Error-correcting codes (ECCs) are used in more robust systems and are typically collectively stored in an additional device to detect and correct specific error conditions. Memory system architectures typically require a choice of ECC implementation to either correct many error bits in one or two memory devices, or error correction of one or two bits can be provided across many memory devices.