1. Field of the Invention
The present invention relates generally to pipelined processor operation in the presence of soft errors in memory and in particular relates to methods and structure for correcting soft errors found in information (program instructions and/or data) read from an attached memory subsystem.
2. Related Patents
This application claims the benefit of U.S. provisional patent application Ser. No. 60/535,005 filed on Jan. 8, 2004 which is hereby incorporated by reference.
3. Discussion of Related Art
It is common practice in present day computer processor architectures to utilize pipelined architectures for improving performance. In particular, present-day processors utilize pipelined architectures in accessing memory subsystems. Programmed instructions and data are fetched and/or read from a memory in a pipelined fashion so as to optimize processor performance in fetching and executing instructions.
In general, “pipelined architecture” refers to an architecture of, for example, a computing processor in which various stages of the instruction and/or data memory read operations are substantially independently operable and thus may be overlapped with other processing stages within the processor to decode instructions, determine addresses of operands, reading operand data from memory and execute the instructions. The particular stages of a pipeline and the degree of independence of the pipeline stages are matters of design choice for the processor designer. By so overlapping various stages of processor instruction fetching and execution and related data reading and decoding, the processor may most effectively utilize available bandwidth from the memory subsystem. In other words, a next memory read or fetch operation may be initiated a while later stages of a previous fetch or read continue to completion. The overlapped operations of the various stages of memory access allow a new memory access as often as every clock period of the memory subsystem. Such pipelined architectures are well known to those of ordinary skill in the processor and memory architecture arts.
Memory subsystems coupled to such pipelined processors are rapidly evolving as measured by both capacity and performance. As memory speeds and densities increase, so to does the susceptibility of such memories to “soft errors”. A “soft error”, in general, is one that may be corrected by applying error detection and correction techniques and circuits. Such a soft error may be manifested as, for example, a flipped bit. In other words, a bit value previously stored in memory as a zero value is flipped to a one value, or vice versa, or otherwise erroneously read back. Such errors are most commonly caused by radiation of various types and frequencies that interfere with the persistent storage of data in the memory semiconductor circuits.
Processors may encounter such soft errors during instruction fetch or data read operations. When soft errors are encountered in a pipelined processor architecture, the overlapped sequencing of the pipeline stages mat be stopped or disrupted (i.e., “stalled”). The pipeline stage may then need to be emptied and re-started to achieve the desired synchronization and overlap of stages of the memory interface pipeline. This stalling of the pipeline and the associated restarting thereof negatively impacts overall system performance.
However, the frequency of such soft errors is extremely rare as compared to the frequency of memory access due to the typical, error-free, pipelined instruction fetch and execution processes. Though somewhat infrequent, it remains an ongoing problem to reduce the frequency of encountering such soft errors or to correct them when detected.
One current solution to soft error problems in a memory subsystem is to add an error correcting code to all stored data and associated error detection and correction logic. Often such logic is embedded within the memory controller device used by the processor to interface with the memory subsystem. The error code is stored along with data written to the memory. When data is read from memory, the associated error code is also read and may be used to detect the presence of an error and to correct the error. For example, one common error detecting and correcting code utilizes a six bit Hamming code associated with each of stored 32 bit value in the memory subsystem. Such a six bit Hamming code is capable of detecting multiple bit errors in the corresponding 32 bit stored value and is further capable of correcting any single bit error in the corresponding 32 bit stored value. During a write operation in such an enhanced, error correcting memory subsystem, the processor writes a new 32 bit value to the memory subsystem and the error correction logic associated with the memory subsystem generates the corresponding six bit Hamming code such that a 38 bit value is then stored in the identified memory subsystem location (32 bit value supplied by the processor and six bit hamming code generated by the error correction features of the memory subsystem). During a read operation, the memory subsystem uses the retrieved 38 bits to produce a corrected 32 bit value even in the presence of any single bit error since the previous write operation.
However, such “in-line” correction typically slows the memory subsystem read performance for all memory accesses. In general, such error correcting techniques and logic add one or more “wait states” to every read or fetch operation. In view of the relative infrequency of such soft errors, imposing such a wait state penalty on every read or fetch operation is problematic and slows overall operation of the corresponding system.
Another presently known technique for reducing the frequency of soft errors is to provide a so called scrubbing engine. A scrubbing engine is an element operable as a background process that periodically reads each location in the memory, checks for errors using error correcting and detecting codes and techniques, and corrects any detected errors by writing the corrected value back to the memory location from which the read erroneous value was read. Such a scrubbing engine may impose one or more wait state delays on the memory subsystem performance but only when but the scrubbing engine detects an error and attempts to write a corrected value back to the memory subsystem. While the scrubbing engine architecture can improve performance over an in-line error correcting memory subsystem design, the scrubbing engine architecture does not eliminate all possible soft errors but merely reduces the probability or frequency of such soft errors arising.
Yet another present solution for reducing system performance impact due to soft errors in a memory subsystem suggests that an in-line error correcting memory subsystem architecture may overlap the correction error detection in correction logic with presentation of the fetched data to the requesting processor. If it is later determined that the value returned from the memory subsystem to the requesting processor included a correctable error, the memory subsystem may in some manner interrupt the processor to cause invocation of appropriate error handling procedures within the processor. For example, the processor may be interrupted and provided with a corrected memory data value. Processors adapted for such a memory architecture may receive the interrupt signal and restart execution of the instruction corresponding to the erroneously fetched information. While such a process eliminates the wait state penalty of the above discussed in-line error correcting memory architecture, the re-execution or restart sequencing, per se, is complicated and can impose significant design tradeoffs within the processor pipeline architecture. Specifically, the processor pipeline design is complicated due to the lateness of the error signal relative to the processor's fetching and execution of the earlier, erroneous instruction or data.
All presently known solutions for reducing or eliminating soft errors in a memory subsystem incur certain performance penalties or complexities in a processor pipeline design. In-line error detection and correction architectures may add wait states to all memory accesses and thereby impose a significant cost on all memory read operations (regardless of whether an error is detected or corrected). Imposing such wait state delays on all memory accesses significantly diminishes overall system performance despite the relative infrequency of memory soft errors.
Failure to correct detected soft errors in-line (i.e., in real time as they are fetched or read) may impose other performance obstacles in that the pipelined fetch/read operations may be stalled when an error is detected and/or corrected thereby requiring restart of the pipeline architecture. Such pipeline architecture stalls and restarts cause other performance problems in the overall system and add design complexities. A processor pipeline stall and restart generally entails emptying the pipeline of multiple instruction and data read operations in various stages and then restarting the fetch/read operations so purged from the pipeline.
It is evident from the above discussion that a need exists for improved memory subsystem and processor pipeline architectures that permit detection and correction of memory soft errors with minimal impact on processor and system performance.