The present invention relates to computer systems, and more specifically, to mirrored memory scrubbing of computer memory errors.
Computer memory chips are susceptible to natural background radiation such as cosmic rays and alpha particles. For example, radioactive elements in a computer chip's material decay and release alpha particles into the chip. Radiation such as this occasionally causes inconsistencies within a system's memory which result in errors. Such errors are often referred to as soft errors. This natural radiation can cause a memory cell to change state to a different value, while not altering the physical structure of the computer chip. Soft errors range from a variation in an instruction within a program, to a modification in a single data value. The probability of a soft error occurring is relatively small. However, with the large amount of memory included in modern computer systems, the rate of occurrence increases significantly.
Memory scrubbing is a process used to correct errors of memory locations by inspecting and correcting errors using error-correcting code (ECC), as well as replacing the corrected data back in its original location. This process is done periodically as a background operation of a system.
A memory controller scans through a system's memory and determines where soft errors occur. ECC is generally implemented to correct detected soft memory errors, then replaces the detected errors with corrected data at the appropriate location. Memory scrubbing is often classified as a reliability, availability, and serviceability (RAS) feature as it generally increases reliability of a system's memory.
The memory scrubbing process is done periodically, in the background of a system, rather than constantly. Performing memory scrubbing generally requires additional system power to check for and correct errors, as well as requires additional logic in the memory controller to manage the reading of the memory.
A mirrored memory system consists of a division of memory into two memory channels, sometimes referred to as memory partitions. Data stored in a main memory channel is duplicated in a mirrored memory channel. If a soft error occurs in the main memory channel, the mirrored memory channel can be used to access the correct data or instruction, and ECC is applied to correct errors and reflect identical information.
For mirrored memory systems, the main memory and mirrored memory are identical, other than for unpredictable soft errors, therefore, errors found in one memory channel can be corrected using the corresponding memory location of the other memory channel. The main memory and mirrored memory are scrubbed concurrently during the memory scrubbing process.