The need to detect and correct errors in data transmitted between components, i.e., units, of a data processing system is acknowledged. In general, when message data are transmitted between units, the data are encoded such that transmission errors can be detected and possibly corrected by decoding at the receiving end. Errors may also occur when a coded message is recorded in a storage unit, such as magnetic disc, or when a recorded message code is read. Various systems, employing various self-correcting codes, exist for detecting and correcting data transmission errors.
The major existing error detecting and correcting codes are the block, systematic, algebraic, external, internal, linear, cyclic, binary and truncated. The block code is used when data to be stored, for example, on a magnetic disc, are organized into blocks of K symbols of a given length. A typical value of k is 256, for instance, for blocks that are each one byte in length. The block code changes the useful logic block into a logic block having a length of n.multidot.k (where n is an integer&gt;than k). The block contains redundant characters for enabling detection and correction of a certain number of errors in stored data read from the disc. The systematic code is formed by concatenating a useful block of k length with a redundant character of r length, wherein the total length of the redundancy checkwork is n=k+r symbols.
In the algebraic code, coding and decoding are based on an algorithm which is usually defined by a series of simple calculations. The external code is a coding technique defined above which operates on relatively long data blocks. The external code is usually programmed into a control unit or into the magnetic disc. In the internal code, techniques are employed to transform a series of logic data into a signal which can be written onto a disc. The internal code is also known as a modulation or a recording code, normally performed bit by bit, or on a relatively small number of bits (a few or a few dozen bits). In other words, the code words of the internal code are extremely short. A code word in the external code is therefore composed of a series of internal code words.
A code is said to be linear when an error configuration E.sub.1 is translated by a syndrome S.sub.1, an error configuration E.sub.2 is translated by a syndrome S.sub.2, and the error configuration E.sub.1 +E.sub.2 is translated by syndrome S.sub.1 +S.sub.2. The syndrome is defined below. A cyclic code produces a new code word by cyclic permutation of a set of symbols forming one code word. A code is said to be binary when isolated bits are the elementary symbols on which the algorithm of the code operates. These elementary symbols can be characters of an "alphabet" having more than two elements, in which case the code is called "non-binary" code. However, each character may be represented by a number of bits. The distinction is necessary mainly to correctly define properties of the code. In a truncated code, the total effective length of the protected block (data+key or checkword) is less than the natural period of the cyclic code or the interleaving period.
In general, only minimal additional complexities arise from truncation. In defining the properties of a code, it is usually sufficient to consider that the truncated section consists of fictitious zeroes. The only drawback to truncation is the additional time which may be required to examine the truncated section, when the decoding algorithms are performed, but there are methods for overcoming this drawback.
A number of definitions relating to error correction are now provided.
Error syndrome:
An error syndrome, or "syndrome" is the first result of processing a coded block during decoding. The syndrome length is always equal to the key or checkword length. In the absence of read errors, the syndrome is zero, so that the validity of the block can easily be determined. In the presence of errors, the syndrome reflects the error configuration and enables detection and correction of the error(s) within certain limits. In general, the produced errors cannot be recognized directly in the syndrome and error identification requires additional processing, i.e. decoding of a more or less complex nature.
Miscorrection:
When an error configuration results in transforming one code word into another code word, or into a sequence of symbols close to another code word, the result is miscorrection without the operator's knowledge. The ability of a code to avoid miscorrection is an essential property thereof. However, this ability is never absolute. The probability of miscorrection depends on the selected code, on the redudancy thereof, on the correction/detection compromise, and on the natural statistics of the errors affecting the message received.
Residual error:
Residual errors are caused by an error which has not been detected or from those errors which are miscorrected.
Error propagation:
Miscorrection normally causes the original errors to remain and to be compounded by new errors in the block. An error propagation phenomenon is thus produced, and can affect the total length of a code word. This condition can be produced in both the external and the internal code. It can result in a condition wherein the logic errors to be processed by the external code are longer than the physical errors which produced them.
The most frequently used error detection codes are cyclic codes. Error detection is the primary application of the cyclic codes which are widely employed. Error detection involves using a redundant key or redundancy checkword concatenated to the message (systematic code) during transmission and reception in telecommunications systems or during reading from a memory. For error detection, a relatively short key (a few bytes) can be used for relatively long blocks (several thousand bytes). The key or checkword enables a received message to be sorted into two categories, viz: (1) messages which exactly match one code word, in which case it is assumed there are no errors; and (2) the remaining messages which do not match any code word; in this case, the existence of one or more errors is a certainty. In general, cyclic codes are used only for error detection; the errors are not corrected on the basis of information provided by the redundancy. The message is retransmitted in telecommunications systems or reread from magnetic memories.
The process of error detection using a cyclic code is simple (requiring little hardware), efficient (low redundancy), and powerful (information can be retrieved, at least if only random errors are present). However, this process is defective in three cases, viz: (1) when permanent errors exist, which is rarely the case in telecommunications but frequently occurs in high-density recording systems, due to physical defects on the recording medium; (2) when the message cannot be retransmitted due to time factors (e.g., in space communications); and (3) when a significant error transforms one code word into another code word and the error is not detected. Cyclic codes can overcome these disadvantages if the error configuration remains in reasonable limits; if a checkword of sufficient length is added to the data block, this redundancy can be used to correct errors.
The Hamming code is the simplest one-bit correction code of the most frequently used cyclic codes. A disadvantage of the Hamming code is that if the error is greater than one bit, the syndrome is still corrected as a one-bit error which is usually different from each of the bits which form the actual error configuration. This disadvantage produces miscorrection. Adding a parity bit improves this situation because a parity bit enables a corrected one-bit error to be distinguished from a two-bit error, which is detected but not corrected.
Generally speaking, cyclic codes provide for very simple error detection and correction whenever the data to be protected are in the form of long sequential chains, as is the case is telecommunications and magnetic memory systems, but not the case in random access memories.
All transcoding calculations required to generate the checkwords (systematic codes) or relatively short syndromes (low redundancy rate) can be performed using a conventional sequential cable system which comprises a serial or serial-parallel shift register having a number of flip-flops equal to the number of redundant bits forming the checkword in combination with a number of "EXCLUSIVE OR" gates, and feedback circuits looping on intermediary stages which introduce a "perturbation" element for generating cyclic phenomena over much longer periods than the checkword length. This extremely simple sequential structure is very easy to manufacture on a large scale as a single integrated circuit.