One or more aspects of the invention relate generally to uncorrectable memory errors in pipelined central processing units (CPUs).
Modern processors may operate using a plurality of processor cores working in conjunction with a memory system that is structured into different hierarchy levels. The closer a memory level to the processor core the faster the access of the processor core to the data in the memory. For example, access to an L1 (level 1) cache memory is much faster than a data access to an L3 or L4 cache. Additionally, the processors are constructed in a way to allow for a fast throughput of instructions and data through the processor. Pre-fetch logic and other sophisticated pipelining mechanisms may be used in conjunction with the memory system hierarchy for this purpose.
Therefore, it is a desire of a processor designer to avoid loading wrong or faulty data into the processor because repairing results of such faulty data is costly for the processor in the sense of overall computing power and throughput.
Thus, if an uncorrectable error is detected at a memory location of the main memory it may cause the processor to go through a recovery mechanism multiple times. This is a particular problem at the time when the memory location is hit by a pre-fetch operation in which case the problem is not reported to the operating system where it could be handled. Normally, memory errors in pre-fetch “branch wrong” paths are not reported and thus, not treated with an error recovery or repair procedure. This may imply the risk of entering a recovery loop.
A pre-fetch may, e.g., hit an address that may contain an uncorrectable error. Since it is a pre-fetch, it may or may not contain vital data for continued processing. The error may be reported as a core recovery error by the ECC (error correction code) logic. After recovery, the core may resume operation and may potentially pre-fetch the address of the error again. If this repeats several times, the core is eventually spared even though the logic of that core is without error.
There are several disclosures related to a method for memory errors in pipelined CPUs.
Document US 2008/0270821 A1, which is hereby incorporated herein by reference in its entirety, discloses a system and method of recovering from errors in a data processing system. The data processing system includes one or more processor cores coupled to one or more memory controllers. The one or more memory controllers include at least a first memory interface coupled to a first memory and at least a second memory interface coupled to a second memory. In response to determining an error has been detected in the first memory, access to the first memory via the first memory interface is inhibited.
Document WO 2014/051550 A1, which is hereby incorporated herein by reference in its entirety, discloses techniques for recovering from non-correctable memory errors. A memory location may be accessed. It may be determined that the memory location contains a non-correctable error. A range of addresses associated with the memory location may be determined. Corrective action may be taken on the entire range of addresses to identify other addresses within the range of addresses that contain non-correctable memory errors.
However, there may be a need for optimized handling of errors in storage cells occurring to memory cells of memory levels close to the processor core.