A modern computer system comprises a processor, a data store (e.g., a hard drive) and a memory system. The memory system comprises one or more memory modules and a memory controller. While the data store provides a means to store a much greater quantity of data than can be stored in the memory modules, the structure of the memory modules enables the processor to access data from the memory modules more quickly.
During normal operation of the computer system, and under control of an operating system, the processor causes the memory controller to transfer data between the data store and the memory modules, and between the memory modules and the processor.
From time-to-time, part or all of a memory module may fail. If the module experiences a catastrophic failure, the memory module may need to be replaced. However, many failures take the form of hard or intermittent failures of a single bit (or small, localized set of bits). Some operating systems respond to these smaller memory failures as if they were larger errors by, for example, de-allocating an entire page of memory locations (i.e., de-allocating much more memory than is necessary to isolate the memory failure). Other operating systems do not respond to memory failures at all, thereby resulting in slower memory response times and/or risking the loss of data.