The evolving technology in the design of computer system memories has tended to render magnetic core memories obsolete. In their stead, many modern computer systems utilize high density semi-conductor memory utilizing integrated circuit chips with each chip containing thousands of individual memory cells, each holding one bit of data. Each memory cell is composed of one or more transistors and associated components depending upon whether the memory cell is static or dynamic.
The cost per bit of storage of data in computer memory systems is a major component of the total cost of the computer system. Because modern application programs and operating systems are composed of millions of alphanumeric characters, and because each character must be represented by a multi-bit magnetic code, millions upon millions of binary bits must be stored.
The economics of integrated circuit manufacture are such that miniaturization leads to increasing cost effectiveness of the chip. That is, if more circuits can be put on one silicon wafer within one package, the chip will have relatively more functionality for roughly the same cost. The silicon wafers themselves cannot be made larger than a certain maximum size, because the silicon wafer has to be a single crystaline structure and larger sized crystals increase the probability of imperfections in the crystal lattice which reduces the rate of return of good circuits per batch. Therefore, to pack more functionality onto a chip, the circuits themselves have to be made smaller.
As the circuits become smaller, the geometries of the transistors and other associated components making up the memory cell approach closer and closer to the sizes of atoms and molecules themselves. In particular, the amount of charge stored in a memory cell to indicate whether a 1 or a 0 is stored in that cell, becomes smaller as the geometry of the storage cell itself becomes smaller.
Alpha particles are charged particles which are emitted from various materials used in the packaging and construction of integrated circuits in the process of natural radioactive decay. These particles can pass through the bit lines, sense amplifiers, and storage cells of a memory cell, thereby upsetting the charge distribution and causing what are called soft errors. Essentially, a soft error is found in any word stored in a memory (each word is a predetermined number of bits in memory) where the pattern of binary 1's and 0's coming out of the memory has been changed by an alpha particle from the pattern of 1's and 0's originally stored in the memory. The single and multiple soft errors can be detected and corrected using error correction codes. These error correction codes utilize additional bits called check bits which are generated from the original data word and which are stored with it. By reversing the process of generating the check bits when the word is read out from memory, the binary pattern of 1's and 0's coming out can be checked against what it is supposed to be utilizing the check bits stored with the data.
Hamming codes are forward acting error correction codes which use certain binary codes designed to self-correct data altered by extraneous events such as "soft errors".
Forward acting error correction codes can be divided into two broad classes: block codes and convolutional codes. In block codes, data bits are taken k at a time and c parity bits are each to check a different combination of the data bits. A block consists of n=k+c bits. A systematic code is one in which the information bits occupy the first k position in a block and are followed by the (n-k) check bits.
A convolutional code is one error correction code wrapped around or convoluted on another. It is the convolution of an input data stream and the response function of an encoder. Usually the encoder is made up of shift registers, and modulo 2 adders are used to form check bits, each of which is a binary function of a particular subset of the data bits in the shift registers. The system disclosed herein utilizes a block code.
Another block code is the group code where the modulo 2 sum of any two n-bit code words is another code word. Modulo 2 in addition is denoted by the symbol 0. It is binary addition without the "carry", i.e., 1+1=0 without the carry. In hardware terms, modulo 2 addition can be carried out by an exclusive-OR gate. For example, summing 10011 and 11001 in modulo 2, we get 01010.
The minimum Hamming distance is a measure of the error detection and correction capability of a code. This "distance" is the minimum number of digits in which the two encoded words differ. For example, to detect E digits in error, a code of a minimum Hamming distance of (E+1) is required. To correct E errors, a code must display a minimum Hamming distance of (2E+1). A code with a minimum Hamming distance of 5 can correct a single error and detect some two digit errors in a data word of 16 bits.
Reliable memory systems can be designed either by using highly reliable but expensive components or by employing inexpensive protective redundancy in terms of error correcting codes that use redundant check bits. The degree of reliability can be increased if this protective redundancy matches the failure mode of the memory system. Because of lower cost, higher speed, and higher density, semiconductor RAM chips are replacing core memories and single bit errors are more probable in these chips than multiple bit errors.
However, single bit errors can become multiple bit errors if not corrected before a second alpha particle strikes and changes another bit. Hence, it is highly desirable that a memory have a constant on-going error correcting process in addition to error correction on every access of a data word. Such an on-going error correction process acting independently of error correction on access of data would tend to catch and correct single bit errors before they became double bit errors.