The present invention relates in general to error correcting circuits for random access memory devices and more specifically to an error correcting circuit providing high speed single bit error checking and correction for each word in the memory.
An increasingly serious problem in random access memory (RAM) devices is the failure of a memory cell on an impermanent basis, i.e. a "soft failure". This mode is distinct from a "hard failure" mode, wherein a given memory cell is permanently stuck in a particular state, i.e. stuck in a "0" or "1" state.
A soft failure may occur in a normally functional memory cell as a result of any one of the following identifiable mechanisms. The main source of soft failures is stray radiation that passes through a memory cell and releases some amount of charge. If large enough, this stray transient charge will cause the memory cell to change state from a correct state to an erroneous state. The most common type of radiation is alpha radiation, although the same effect has been reported with cosmic and gamma rays. Alpha radiation can arise from trace contaminants in the packaging material used to house the memory, e.g. an integrated circuit package, or from some source external to the memory. Cosmic rays originating in outer space, for example, constantly impinge on the earth in a random manner. A significant characteristic of such soft failures in a memory device is that they are randomly distributed, they generally occur infrequently, and in each instance a soft failure only occurs in a single bit location (a single memory cell) within the memory.
A second source of soft failures is that a given memory cell may be only marginally functional, e.g. a cell that is unable to retain a specified minimum voltage swing. Such a cell may be disturbed by an accidental transient pulse originating as a result of its proximity to another cell in the memory being addressed at that point in time, or from some other source.
Improving data integrity in random access memory devices, by correcting for soft failures, has only become important in recent years. In the past, such failures have been of little concern due to a Mean Time Beteeen Failure (MTBF) which has been calculated to be on the order of 10.sup.7 hours per device. However, this Mean Time Between Failure number in a systems context has been greatly aggravated as a result of the increased amount of memory existing in many electronic systems, i.e. systems having a thousand RAM chips are no longer uncommon. The MTBF number has also worsened as a result of the fact that the minimum size of a particular memory cell is constantly being reduced with improved technology. This latter improvement means that, by comparison with the past, smaller and smaller increments of charge are used to hold information in the memory cell. As a consequence, a given amount of stray charge striking a memory cell will now have a much greater proportionate effect on the cell, such that it is more likely that this stray charge will cause the memory cell to change state and thereby create an error.
Thus, in present day systems, radiation induced or electronically induced soft failure errors in a system memory may occur with a rate on the order of one per week or even one per day. In a system requiring a high level of reliability in its calculations using data stored in memory, an error rate this high is unacceptable.
Single bit soft error correction has not generally been performed in prior art memory systems, since other error modes tended to be more significant and the rate of occurrence of such soft errors was infrequent enough to generally go unnoticed. Another problem with single bit soft error correction is that in conventional implementations, it is complex to implement and tends to seriously degrade the read/write speed of the random access memory, as well as other system performance parameters.
Prior art error correction systems have also only generally provided for a single parity bit to be generated, i.e. an odd or even parity bit. The system then tests for an erroneous parity in the word once the word is read out from the memory. Such a testing scheme does not enable the system to discern which specific bit or bits is in error in a word having such a parity error. As a consequence, the entire data word is lost when a parity error is found.