Block error correcting codes used in memory devices, such as Reed Solomon codes, have two portions, parity across the blocks that identifies the failed bits within a block and a locator portion that identifies the location of a failed block. One way to enhance Error Correction Code (ECC) coverage is to use a technique that distributes error correction over multiple memory resources to compensate for a hard failure in one memory resource that prevents deterministic data access to the failed memory resource. This distributed error correction is referred to as lockstep memory or chipkill. A lockstep memory comprises a multi-channel memory layout in which the data of one cache line is distributed between two different memory channels, so one half of a cache line is stored in a first memory module, such as a Dual in line Memory Module (DIMM), on a first channel, while the second half of the cache line goes to a second memory module on a second channel. A dual in-line memory module (DIMM) comprises a series of dynamic random-access memory integrated circuits mounted on a printed circuit board. For instance, for 4 byte wide (×4) DIMMS, combining single error correction and double error detection capabilities of two ECC DIMMs in a lockstep layout, their single device data correction (SDDC) can be extended into double device data correction.
Additionally, for 8 byte wide (×8) DIMMSs, without lockstep, each DRAM device contributes 8 bytes of data per cache line. In the case of a device failure, a block of 8 bytes is affected. The number of ECC bits available is not sufficient to do a block correction of 8 bytes. Once lockstep is enabled, each DRAM device now contributes 4 bytes of data to a cache line. The ECC bits available are now sufficient to correct a block of 4 bytes and SDDC is achieved.
There is a need in the art for improved techniques for performing error correction in lockstep memory modes.