1. Technical Field
Embodiments of the invention generally relate to the field of computer memory and more particularly, but not exclusively, to handling of an error in a memory device.
2. Background Art
In today's computing world, maintaining good computer system reliability and uptime is often important or even mandatory. To maintain significant computer uptime, system designers build reliability, availability, serviceability, manageability (RASM) features to improve overall system reliability and availability. Thus, it is common to find various degrees of redundancy, error correction, error detection and error containment techniques employed at different levels in such a system.
One of the most common types of computer system failure is attributed to system memory errors. Memory devices are susceptible to errors such as transient (or soft) errors. If these errors are not handled properly, they can cause a computing system to malfunction. Hence, the memory subsystem (especially dual in-line memory modules or DIMMs) receives particular attention in this regard. For example, redundant information in the form of error correcting codes (ECCs) or other such error correction information can be used in memory scrubbing operations to improve overall system reliability. Demand memory scrubbing is one error detection/correction technique wherein errors in a memory segment, whether single-bit or multi-bit errors, can be detected in the course of operation to service a host operating system's requests to access the memory segment. By contrast, another RASM technique known as patrol memory scrubbing pro-actively scans a memory segment for errors before, or otherwise independent of, any such host operating system requests to access the memory segment.
Another RAS technique—known as “memory sparing”—allocates one or more memory segments each to be available for service as a spare segment in the event of an actual or expected future failure of an in-use (or “active”) memory segment. When error detection or other mechanisms indicate such failure of an in-use memory segment, a spare memory segment is allocated to serve as a successor to (substitute for) the failed/failing segment. The system memory map is updated to associate addresses—e.g. a range of addresses—with memory locations of the successor segment, where previously such addresses were mapped to variously identify respective locations of the failed/failing active segment.