In the prior art many error detecting and error correcting codes have been evolved to insure the integrity of the data to be processed. Generic to all of these codes is redundancy, wherein additional bits are added to the data bits as a function thereof with the check bits being recomputed as desired for error detection and possible correction.
One class of codes, known as single error correction, double error detection, (SEC/DED) is described by R. W. Hamming in "Error Detecting and Error Correcting Codes", Bell Systems Technical Journal, 29, 1950, pages 147-160.
The Hamming and similar codings function effectively in situations where single and double errors predominate; for example, magnetic core memories, which belong to an older technology. Newer technologies employing solid state components and in particular integrated circuits, display a fault behavior which differs from the older technologies; while single bit errors still predominate, higher-order errors now appear in numbers which cannot be neglected. It is characteristic of the new technologies that these higher order errors result from a stuck logic module in which a group of four (4) and eight (8) or some other number of contiguous bit positions become stuck in the logical one (1) or zero (0) state. Similar fault patterns also occur in transmission circuitry.
In an error correction system using a parity check code the parity checking operation may be described by the relation EQU AW=C
where A is the parity check matrix having m rows and n columns, W is a code word n bits in length, and C is the m bit result, which is sometimes called the checking number; there are m parity check bits. In a system using a single error correcting Hamming code the binary values of the columns of the A matrix run from 1 through n. The above relation may be expressed in component form as follows:
______________________________________ A [1,1] W [1] + A [1,2] W [2] + . . . + A [1,n] W [n] = C [1] A [2,1] W [1] + A [2,2] W [2] + . . . + A [2,n] W [n] = C [2] . . A [m,1] W [1] + A [m,2] W [2] + . . . + A [m,n] W [n] = C ______________________________________ [m]
A valid code word, W, will give a checking number, C, all of whose components are zero. If bit i in a code word W is the only error, then C becomes in component form
C[1]=a[1,i] PA0 C[2]=a[2,i] PA0 C[m]=A[m,i]. PA0 C[1]=a[1,i]+A[1,j]+A[1,k] PA0 C[2]=a[2,i]+A[2,j]+A[2,k] PA0 C[m]=A[m,i]+A[m,j]+A[m,k].
Since stored and transmitted data tend to be made up of random patterns of 0's and 1's (over a sufficiently long time period) a stuck logic fault affecting n bit positions can manifest itself in any one of 2 raised to the nth power quantity -1 equally likely error patterns. Of the resulting 2 raised to the n-1 power odd error patterns, 2 raised to the [n-1] power quantity -n are higher-order odd errors (i.e., 3 or more erroneous bits). In such a situation a prior art error correction system based on a Hamming code performs poorly because of the large number of mistaken corrections. A SEC/DED Hamming code also detects all 2-bit errors; however, a large percentage of the higher-order (4 or more) even errors are undetected. Because of these characteristics a correction system utilizing a Hamming code is not well suited to the new memory and transmission technologies.
Another disadvantage of prior art correction systems based on the Hamming code is apparent when the word to be encoded consists of two or more independent strings of bits. For example, in copending application, Ser. No. 893,068, for an "ERROR CONTROL SYSTEM FOR NAMED DATA", filed Apr. 3, 1978, the data word and its data name are concatenated and the combined word encoded with only the check bits and data word being stored. In such a system the data name is independent of the physical address in memory containing the associated data word and a faulty fetch operation can produce a data word which, although it is from a near-by physical location memory, will have an associated data name which differs from the desired one in a random way. That is, the error patterns in the data name field tend to be randomly distributed over that field. Thus the errors of major concern would be different for the data word and data name portions of a code word. The error checking code should accommodate differing failure modes across a code word. The binary value of the ith column of A is i in most system implementations using a Hamming code. Thus the binary value of C locates the position of the erroneous bit in W. If bits i, j, and k are in error then C is given by the modulo 2 sums
In this case the checking number mistakenly points to a bit location to be corrected. If this location is in the valid addressing range, 1 through n, a mistaken correction will be made. If C is in the range n+1 through 2 raised to the power m quantity -1, an uncorrectable error can be detected and a mistaken correction thus avoided, provided the appropriate circuitry is provided. However, as the length of n of a Hamming code increases toward a power of two, opportunities for preventing at least some mistaken corrections vanish.
Therefore, it is an object of the present invention to provide an error correction method and system for reliable error detection and correction in the new data transmission and storage technologies.
It is another object of the present invention to provide an error correction method and system for reliable error detection of error patterns randomly distributed over a group of contiguous bits.
It is yet another object of the present invention to provide in a named data environment an error correction method and system tailored to cover reliably both data words and associated data names.