1. Technical Field
The present invention relates in general to data processing systems, and more particularly, to memory management of data processing systems. Still more particularly, the present invention relates to memory error recovery in data processing systems.
2. Description of the Related Art
Almost all modern computer systems utilize some type of memory for data storage. However, those skilled in this art will appreciate that higher memory storage capacity frequently results in higher latency required to locate and access data on the high-capacity memory. Therefore, to address the access latency problem, many modern data processing systems include a memory hierarchy to keep frequently accessed data in low-capacity, fast access memory (e.g., L1, L2, or L3 cache) and retrieving data from higher-capacity, slower access memory (e.g., dynamic random-access memory (DRAM)) only when data only stored in that memory is required for processing.
However, like all electronic memory storage devices, DRAMs are prone to errors. For example, these errors may be caused by a defective DRAM module. DRAMs may be poorly manufactured or develop a defect due to wear-and-tear. Correctable errors caused by a defective DRAM module can be corrected by an algorithm such as Error Correction Code (ECC), discussed herein in more detail. Uncorrectable errors (UEs) can be handled by deconfiguring, or disabling the defective portion of the memory. discussed herein in more detail. Uncorrectable errors (UEs) can be handled by deconfiguring, or disabling the defective portion of the memory.
Those skilled in this art will appreciate that a way to correct errors in memory is to restart the computer system, which returns the system to a default state. However, the shutdown and restart of the computer system is inefficient and time consuming because the system is typically taken off-line and the services provided by the computer system are unavailable during the shutdown and restart.
Therefore, there is a need for a way to correct memory access errors without the shutdown and restart of the entire computer system.