1. Field of the Invention
The present invention relates generally to memory protection and, more specifically, to a technique for detecting and correcting errors in a memory device.
2. Description of the Related Art
This section is intended to introduce the reader to various aspects of art which may be related to various aspects of the present invention which are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
Semiconductor memory devices used in computer systems, such as dynamic random access memory (DRAM) devices, generally comprise a large number of capacitors which store the binary data in each memory device in the form of a charge. These capacitors are inherently susceptible to errors. As memory devices get smaller and smaller, the capacitors used to store the charges also become smaller thereby providing a greater potential for errors.
Memory errors are generally classified as xe2x80x9chard errorsxe2x80x9d or xe2x80x9csoft errors.xe2x80x9d Hard errors are generally caused by poor solder joints, connector errors, and faulty capacitors in the memory device. Hard errors are reoccurring errors which generally require some type of hardware correction such as replacement of a connector or memory device. Soft errors, which cause the vast majority of errors in semiconductor memory, are transient events wherein extraneous charged particles cause a change in the charge stored in one or more of the capacitors in the memory device. When a charged particle, such as those present in cosmic rays, comes in contact with the memory circuit, the particle may change the charge of one or more memory cells, without actually damaging the device. Because these soft errors are transient events, generally caused by alpha particles or cosmic rays for example, the errors are not generally repeatable and are generally related to erroneous charge storage rather than hardware errors. For this reason, soft errors, if detected, may be corrected by rewriting the erroneous memory cell with the correct data. Uncorrected soft errors will generally result in unnecessary system failures. Further, soft errors may be mistaken for more serious system errors and may lead to the unnecessary replacement of a memory device. By identifying soft errors in a memory device, the number of memory devices which are actually physically error free and are replaced due to mistaken error detection can be mitigated, and the errors may be easily corrected before any system failures occur.
Soft errors can be categorized as either single-bit or multi-bit errors. A single bit error refers to an error in a single memory cell. Single-bit errors can be detected and corrected by standard ECC methods. However, in the case of multi-bit errors, (i.e., errors) which affect more than one bit, standard ECC methods may not be sufficient. In some instances, ECC methods may be able to detect multi-bit errors, but not correct them. In other instances, ECC methods may not even be sufficient to detect the error. Thus, multi-bit errors must be detected and corrected by a more complex means since a system failure will typically result if the multi-bit errors are not detected and corrected.
Even in the case of single-bit errors which may be detectable and correctable by standard ECC methods, there are drawbacks to the present system of detecting and correcting errors. One drawback of typical ECC methods is that multi-bit errors can only be detected but not corrected. Further, typical ECC error detection may slow system processing since the error is logged and an interrupt routine is generated. The interrupt routine typically stops all normal processes while the error is serviced. Also, harmless single-bit errors may align over time and result in an uncorrectable multi-bit error. Finally, typical scrubbing methods used to correct errors are generally implemented through software rather than hardware. Because the error detection is generally implemented through software, the correction of single-bit errors may not occur immediately thereby increasing the risk and opportunity for single-bit errors to align, causing an uncorrectable error or system failure.
The present invention may address one or more of the concerns set forth above.