1. Field of the Invention
The present invention relates to a data transmission system and more particularly, a system and method of correcting an error in parallel data paths of a data transmission system.
2. Description of the Related Art
The simplest error correction codes protect against random errors that are introduced into the data stream. The errors are modeled as random events with no statistical correlation to each other or any part of the overall system. Additional bits are added to the data to protect against errors. The effect of the additional error correction bits, is to map the uncoded data into a coded domain.
In the coded domain, valid codewords are selected to represent the uncoded data, and all other invalid codewords are assumed to result from transmission errors. The most common error correcting codes can only correct one error contained within a set block of data. If more that one error occurs within the covered data, it cannot be corrected. If there are enough errors within the correction domain, the codes will breakdown completely, and may actually cause more errors in correct data.
There is a growing trend in system interconnection toward very high speed serial interfaces. There are several industry initiatives to specify these connections. Some of the most notable include Infiniband (e.g., a switched fabric communications link) and the XAUI interface (e.g., an attachment unit interface) in the IEEE 802.3ae specification. IBM offers a family of custom application specific integrated circuit (ASIC) macros known as high speed serial (HSS) to support these interfaces. In these applications, a data bus is transmitted serially at higher speed down fewer lanes.
Reformatting the data for transmission may negate error correction that was originally on the data bus. For example, in the XAUI interface, a 32-bit bus is segmented into four bytewide sections. Each byte is 8b/10b coded for transmission down four 3.125 Gbps transmission lanes. At the receiving side of the interface, the four transmitted data streams are decoded and reassembled into the original 32-bit bus. If the original data bus was coded with a single error correcting hamming code, it would perform very poorly in this architecture.
FIG. 1 illustrates a conventional error correction method (e.g., in a single channel or 8b/10b encoded data). As illustrated in FIG. 1, data transmission may begin with first data 101a which is uncoded and includes 12 bits, then error correction may be added to provide second data 102a which is uncoded and includes 16 bits. The second data 102a may be 8b/10b encoded to provide a third data 103a which includes error correction and includes 20 bits.
Assuming that an error is introduced into the “7” bit in the third data 103a during transmission as depicted in third data 103b, this error may be increased to multiple errors in the “6”, “4” and “2” bits after 8b/10b decoding as depicted in second data 102b, resulting in first data 101b which includes the multiple errors caused by the failed error correction.
That is, as illustrated in FIG. 1, a single error in an 8b/10b code word may generate multiple bit errors when the code word is decoded. These multiple bit errors are beyond the ability of the single error correcting code to handle.
In cases where 8b/10b coding is not used, burst errors on any single lane of the four lane set will negate the error correction on the whole interface.
To make the problem worse, decision feedback equalization (DFE), is being used in new serial interfaces. The IBM high speed serial-deserializer (HSS) macros offer DFE modes. Decision feedback equalization uses the history of what was transmitted before to help in determining the value of the current bit. This powerful equalization works well to remove intersymbol interference.
However, DFE tends to transform single hit errors into multi-bit burst errors. If noise causes a single bit to switch polarity, the feedback nature of DFE will recirculate the error and use the wrong information in the future decisions. The error may be propagated forward for an unbounded number of bits.
Applying simple burst error protection on every lane will help, but cannot correct every case. Likewise, interleaving may help, but there will be cases where the burst length is too long for traditional methods to correct.
There are several known methods which have attempted to this problem in conventional error correction methods. For example, the error correction coding can be made strong enough to handle multiple bit errors. However, this method is a poor solution because the data link is really only generating single bit errors, and the error multiplication is an artifact of the decoder (e.g., either 8b/10b or DFE). Single bit correcting error correction codes are relatively simple and easy to implement, but multiple bit correcting codes require considerable overhead and also use much more power. In the 8b/10b case, the code can be strengthened to cover the 10-bit code word, but due to the unbounded nature of the DFE case, there is no way to make the code strong enough to cover any-length errors.
Another conventional method is to interleave the data prior to coding and transmitting the data. This process collects a large block of data and combines bits from different regions of the data and groups them before encoding and transmission. After transmission, the bits are redistributed back into their original positions before error correction is applied.
The goal of interleaving is to spread grouped transmission errors apart before errors are corrected so each error falls into a different error correction domain. This process is widely used, but adds significant latency to the system and is not useful in high speed systems where latency must be reduced. Also, in the DFE system, it is not possible to design an interleaving system that will always work because the length of the error burst in unbounded.