Digital communication signals to be transmitted over a communication channel such as coaxial cable or a fiber optic link typically are encoded to facilitate accurate reception at a destination despite possible corruption of the signals during transmission due to noise in the communication channel. Depending on the application, either data encoding for improved code characteristics or error-protection encoding is employed.
Data encoding entails the translation or conversion of the information or data signal's bit stream into "codewords" typically characterized by a limited maximum run length, limited cumulative DC-offset, and other signal characteristics that facilitate reception. Specifically, data encoding consists of converting an N-bit data word (or N-bit block) to an M-bit codeword (or M-bit block), where M is greater than N. The "overhead" imposed by the coding scheme is M-N, which may be expressed as a percentage as 100.times.(M-N)/N %. The efficiency of a coding scheme is N/M. The coding scheme adds extra "overhead" bits to the data before it is transmitted in order to convert the bit pattern of the data into a bit pattern that may be received more reliably in the presence of noise in the communication path. The encoded signal may have, for example, a limited run length, a limited cumulative DC-offset, or both.
A coding scheme which converts an N-bit data word into an M-bit codeword is often referred to as a Nb/Mb coding scheme. For example, as 8b/10b coding scheme converts 8-bit data words into 10-bit codewords. Such a scheme has 2 bits or 25% overhead, and has an 80% efficiency.
Some encoding schemes permit two types of codewords to be encoded--data words and command words. This arises from the fact that the Nb/Mb coding scheme may be thought of as a one-to-one mapping between a set of 2.sup.N possible data words and a subset of only 2.sup.N different codewords out of 2.sup.M possible codewords. This leaves 2.sup.M -2.sup.N codewords which never are used. However, in some codings, a small portion of the 2.sup.M -2.sup.N codewords have the same desirable transmission characteristics (e.g., run length and cumulative DC-offset) as the 2.sup.N codewords which represent data. This small portion may be used to represent another class of codewords referred to as command words. It is desirable for a coding scheme to permit encoding of a substantial number of command words which have the same desirable transmission characteristics as the encoded data words.
Limiting the maximum run length in data codewords can be useful, for example, in clock recovery performed during decoding at the destination. Maximum run length is the maximum number of contiguous bits having the same value, i.e., either LOGIC ONE or LOGIC ZERO. Limiting the maximum run length, so as to reduce the length of strings of bits having the same value, is important to facilitate accurate clock recovery at the destination, because clock recovery circuits rely upon transitions between LOGIC ONE and ZERO data values to detect the underlying clock frequency of the data. Clock recovery circuits generally lose synchronization if too many bit intervals elapse without a transition in the data. Therefore, it is desirable to choose a coding scheme having a low maximum run length.
The cumulative DC-offset, also referred to as cumulative DC unbalance or digital sum variation, often is expressed in terms of the number of bit values which would have to be changed to render the bit sequence balanced. For example, if there exists a bit which would have to be a LOGIC ONE to render the sequence balanced, but that bit has a LOGIC ZERO value, then the cumulative DC-offset is one bit. Alternatively, cumulative DC-offset can be given as a single number calculated by assigning a weight of -1 to each LOGIC ZERO bit, and a weight of +1 to each LOGIC ONE bit, and then summing these weights for the bits in the serial stream. Expressed this way, the extra 2 LOGIC ZERO bits in the sequence of the above example yields a -2 cumulative DC-offset. A bit sequence having a cumulative DC-offset of zero is called "balanced," and a sequence with a cumulative DC-offset of 1 or more bits is called "unbalanced."
A bit stream transmitted to a destination consists of a sequence of LOGIC ZERO and LOGIC ONE values. A receiver circuit at the destination typically receives the two logic values as opposite polarity voltages, and the signal voltage in the receiver circuit has a near-DC (i.e., low-frequency) voltage component proportional to the cumulative DC-offset in the bit stream. Since receiver circuits can accommodate only limited DC voltage swings without overload, it is desirable to employ a code which limits the cumulative DC-offset of the encoded data in order to avoid receiver overload.
Furthermore, it is desirable to employ an encoding scheme which achieves periodic DC balance, which is defined as a cumulative DC-offset of exactly zero at the end of every group of K bits, where K is a fixed number of bits. (For example, K may be one or two times the length M of the encoded word or block.) If an encoding scheme has periodic DC balance, its bit stream has limited spectral components below a predetermined frequency proportional to 1/K, which allows a receiver circuit to employ a high-pass filter to block all spectral components below this frequency and thereby improve the receiver signal-to-noise ratio.
A number of data codes have been proposed and commercially used to varying degrees in digital communication. For example, known Manchester codes are readily implemented, have a maximum run length of 2, and are DC balanced over a period of 2 bits. Unfortunately, their 100% encoding overhead is typically deemed excessive. Another known code, the Sperry 4b/5b code, has a maximum run length of 4 and exhibits a mere 25% encoding overhead, but it can exhibit a cumulative DC-offset which grows without limit over time.
Yet another known code is the IBM 8b/10b code, described in an article entitled "A DC-balanced, Partitioned-Block, 8b-10b Transmission Code," that appeared in IBM J. Res. Develop., VOL 27, No. 5, September, 1983. The IBM 8b/10b code is decomposed into 3b/4b and 5b/6b sub-encodings. It has a maximum run length of 5, is fairly easy to implement in hardware, also has a 25% encoding overhead, and constrains the cumulative DC-offset within the bit stream to .+-.3 while limiting the cumulative DC-offset at the end of any 10-bit codeword to .+-.1. Unfortunately, for many applications, the IBM 8b/10b code permits too few command codewords, and its maximum run length of 5 is undesirably high. Additionally, its cumulative DC-offset, which is merely bounded and not periodically balanced, can present difficulties in receiver filtering.
As mentioned above, encoding is alternatively used to provide error detection and error correction mechanisms for transmitted signals. Error correction coding commonly is referred to by its initials, "ECC." A common approach to detect and correct errors in a received communication signal using an ECC is forward error control, known by the acronym "FEC." In forward error control, each transmitted word, block or frame contains additional bits of information (sometimes called "ECC," "redundancy," "protection" or "check" symbols). Employing these symbols, a receiver can detect--and, in some FEC schemes, locate the position of--errors that are present in the bit stream of the received signal. If the position(s) of the erroneous bit(s) is identified, correction is achieved simply by inverting the identified erroneous bit(s), e.g., by changing a LOGIC ONE to a LOGIC ZERO, or vice versa. In this manner, an accurate replica of the transmitted signal is obtained without requiring re-transmission.
Preferred forms of error protection encoding generate redundancy symbols using an FEC code that is both linear and systematic. In a linear code, any sum of any two encoded values results in another encoded value. In a systematic code, each codeword includes a portion identical with the unencoded data; therefore, the resulting FEC code block is formed by concatenating the unencoded data with the redundancy symbols generated by the FEC code.
An example of such an FEC code is a Hsiao code which has a Hamming distance (i.e., the minimal number of bit positions in which any two valid code blocks differ) of four. With this Hamming distance, this FEC code can correct single errors and detect double errors in the transmitted bit sequence. Such a code requires 8 FEC redundancy bits to protect a total of 64 to 127 bits in the data input, 7 bits to protect 32 to 65 bits, 6 bits to protect 16 to 31 bits, or 5 bits to protect 8 to 15 bits.
Existing FEC coding schemes generally address only the error detection/correction problem and do not attempt to improve the transmission characteristics (e.g., run length and cumulative DC-offset) of the data as discussed earlier. A need exists for an improved coding scheme that simultaneously provides error detection or correction as well as improved run lengths and cumulative DC-offset characteristics.