The present invention pertains to the field of error detection and correction in a digital computer. More particularly, this invention relates to error detection and correction during a partial write to memory operation and the marking of a memory location as erroneous when an uncorrectable error occurs during a partial write to memory operation.
Prior art error detection techniques make it possible to detect whether data read from a computer memory contains one or more errors. Furthermore, prior art error correction techniques make it possible in certain situations to correct data read from a computer memory if that data contains one or more errors.
Certain error detection and correction techniques are described in pages 199 to 207 of the textbook Introduction to Switching Theory and Logical Design by F. J. Hill and G. R. Peterson (2nd Ed. 1974). The Hamming Code is an example of a prior art error detecting and correcting code.
In prior art systems using the Hamming Code, certain binary parity check bits, also simply referred to as check bits, are associated with each binary data word. In one prior art system, each check bit is selected to establish even parity over that bit and a certain subset of the bits of the data word. In an even parity system, the total number of ones (or zeros) in a permissible code is always an even number. In an odd parity system, the total number of ones (or zeros) in a permissible code is always an odd number. Prior art systems using the Hamming Code include either even or odd parity systems.
In the above described prior art system using the Hamming Code, each parity check bit results from an exclusive or logical operation between certain selected bits of the data word, the result being the establishment of even parity over the parity check bit and a certain subset of the bits of the data word. Each parity check bit then becomes part of the set of parity check bits associated with that particular data word.
That data word together with its check bits could then, for example, be transmitted from one system to another over a communications line or written into a dynamic random access memory ("DRAM") and then read sometime later from the DRAM. In the interim between sending and receiving, in the case of transmission of the data word, or between reading and writing, in the case involving a DRAM, single or multiple bit errors could occur in the parity check bits and the data words.
The Hamming Code is employed in certain prior art systems to detect or correct errors once the data word is received after transmission or a read from memory.
The incorporation of known error detection and correction methods as part of a partial write operation (or a system employing a partial write operation) has severe limitations, however. In one known partial write operation, one of the objects is to replace a subset of old data stored as a data word with new data--thereby replacing old data with new data and creating a new data word--and then writing the new data word into memory. Known partial write methods involve reading a data word together with its check bits from memory. If the check bits indicate the presence of a single bit error in the data word just read--in other words, a correctable error is detected--then the single bit error is corrected using known methods and apparatuses. A subset of old data that makes up the data word that has been read from memory is then replaced with new data and a new data word is created. That new data word is then written in the memory.
Limitations regarding the known partial write method and apparatus involve the situation when the check bits that are read indicate an uncorrectable error in the data word read from memory. One prior art method was to then abort the partial write operation and terminate the cycle of which the partial write was intended to be a part. In other words, no data--new or old--is written back into memory.
Another prior art method employed for the uncorrectability situation is to write the old failed data word and its old check bits, which indicate an uncorrectable error, back into memory. The partial write operation is thus aborted given that the new data is never merged with the old data. The computer cycle is not terminated, however. This prior art method is based on the hope that the old failed data word and check bits re-written into memory will remain uncorrectable over time. The disadvantage of this prior art method and apparatus is that, if another error or transient occurs, the same memory location may not produce an uncorrectable error on the next memory read operation during which an erroneous data word will be falsely perceived as correct or correctable.