I. Field of the Invention
The present invention relates to data communications and, more particularly, to a new method of encoding and decoding binary data and non-data signals for transmission over a serial data link that is lower in complexity and higher in efficiency and has a desirable data pattern.
II. Background and Prior Art
In a digital data transmission system, binary data, representing some form of information, is encoded in digital form for transmission over the network. For instance, in pulse code modulation (PCM) systems, which are in wide-spread use in the U. S. telephony network, analog voice signals are quantized into a number of discrete levels and also use a code to designate each level at each sample time. This results in a string of binary data, i.e "0's and 1's", representing the transmitted voice signal. In a standard data transmission setting, such as in a local area network (LAN), data from one node in the LAN in the form of a string of binary digits is conveyed to another node connected to the LAN. The characteristics of the transmitted data string varies according to the prescribed protocol that is being conformed to.
Choice of the protocol to be used may depend on a number of factors, such as the type of modulation/demodulation to be used (in long distance communication), constraints on bandwidth, receiver complexity, etc. In any case, the binary data must be represented so that it has electrical characteristics appropriate for transmission over a transmission medium, such as a twisted pair of copper wire. For example, it is advantageous for the coding scheme employed to be self-clocking: i.e., the clock frequency is easily found from the pulse code. Receivers are more able to synchronize to the transmitted data where the data is coded in a scheme having a self-clocking feature.
It is also advantageous for the average value of the transmitted binary data string to be zero volts (i.e., that the data string be symmetrical, or have low or zero "disparity"). An example of a string having an average value of zero volts is one where the cumulative positive voltage is equal to the cumulative negative voltage. In cases where there is high disparity (i.e., many more bits of one polarity than the other), the interaction between high pass circuit components in the transmission path and the low frequency energy in long bit sequences may cause the low frequency energy of the bit string to be filtered out causing data loss.
These problems (i.e., high disparity, etc.) are especially apparent where the data to be transmitted is in a high-speed data transmission system (e.g., a LAN or an ATM network) where the digital binary information stream of data to be transmitted rarely has the characteristics appropriate for available, cost effective transmission links. Typically, the information to be transmitted in such an environment is either very random in nature, or it is very repetitive. These characteristics (a high degree of randomness or repetitiveness) have undesirable properties for high speed transmission.
Random data, such as compressed data and encrypted data, has the characteristic that any given sequence of bits in a random bit stream is equally likely to occur as any other sequence. For example, a string of ten 1's in a row is expected to occur on the average of once out of every 1024 (2.sup.10) strings of ten bits. This is true for any other string of ten bits.
The problem is that long strings of bits without transitions have an adverse effect on the performance of data transmission. This occurs for two reasons. One reason is that clock recovery circuits perform poorly in the absences of bit transitions. Another reason is the high disparity causing the interaction between high pass circuit components in the transmission path and the low frequency energy in long bit sequences discussed above.
Repetitive data, or data that contains strings of repeated sequences of bits, give rise to concentrating the frequency spectrum of the transmitted signal to a few frequency ranges or worse, to a single frequency range. (A string of repetitive data may occur where data representing a screen display is transmitted over the network and where one string of bits represents a particular background color of the screen.) This condition often creates severe problems in EMC compliance (with the FCC) for data transmission products for wire cable during design, development and manufacturing. Furthermore, concentration of energy in limited frequency ranges often reduces the effectiveness of clock recovery in the receiver thereby contributing to lower product performance.
A variety of methods for representing binary data for transmission over the physical media have been employed for solving some of these problems. Some examples of media level encoding schemes are Manchester split phase encoding, Manchester differential split phase encoding, and nonreturn-to-zero encoding. An example of each of these encoding schemes is illustrated in FIG. 1.
The widely used Manchester encoding scheme (split phase) eliminates the variation in average value (or disparity) using symmetry. In the Manchester split-phase method, a 1 is represented by a 1 level during the first 1/2-bit interval, then shifted to the 0 level for the latter 1/2-bit interval; a 0 is represented by the reverse representation. Similarly, in the split-phase (differential Manchester) method, a similar symmetric representation is used except that a phase reversal relative to the previous phase indicates a 1 (i.e., mark) and no change in phase is used to indicate a 0.
Another widely-employed media-level encoding method is the nonreturn-to-zero (NRZ) representation which reduces bandwidth needed to send any type of data. In the NRZ representation, a bit pulse remains in one of its two levels for the entire bit interval. In the NRZ(M) method, a level change is used to indicate a mark (i.e., a 1) and no level change for a 0; the NRZ(S) method uses the same scheme except that a level change is used to indicate a space (i.e., a 0). Both of these examples (NRZ(M) and NRZ(S)) are examples of the more general classification NRZ(I) in which a level change (inversion) is used to indicate one kind of binary digit and no level change indicates the other digit.) The NRZ representations are efficient in terms of bandwidth required and are widely used.
However, the use of split-phase (Manchester and mark) and the NRZ media-level representations require some added receiver complexity to determine clock frequency.
Each of these methods discussed (i.e., split phase Manchester, NRZ(I), etc.) are methods of encoding single bits of information for transmission on a physical medium. For instance, in differential Manchester split phase, each information bit is represented by two "line bits". As shown in FIG. 1, the first information bit "a", having a value of 1, is represented by a high ("1") going to a low ("0"), the high being the first line bit, the low being the second line bit. The second information bit "1", also having a value of 1, is represented by a low ("0") going to a high ("1").
In this particular type of encoding, the line bits representing the information bits are dependent upon the previous line bit. This is called "differential" or "inversion" encoding. For instance, in this example, an information bit having the value of 1 is represented by a two line bits having a transition from the previous state. So, as the first information bit ("a") has a value of 1 and is represented by a high going low, the second information bit ("b") having a value also of 1 is represented by a low going high, a transition from the previous line bit pair. The following information bit "c" has a value of 1 and is represented by a high going low, a transition from the previous line bit pair. The information bit "d" has a value of 0 and is represented by a high going low, no transition from the previous line bit pair. This type of inversionary or differential encoding has the advantage of, among other things, being tolerant if two lines in a twisted pair become inadvertently swapped.
As was discussed the differential Manchester split phase and both of the NRZ(I) media-level encoding schemes (i.e., NRZ(M) and NRZ(S)) are differential or inversionary media-level schemes. The NRZ(I) schemes, however, allow for higher bandwidth transmission as less clock cycles are required. One of the drawbacks with the NRZ(I) media-level encoding is that it does not guarantee symmetry as does the Manchester encoding schemes. Furthermore, it does not guarantee clocking information so that the receiver may synchronize with the transmitter (i.e., it is conceivable that a string of line bits having the same value is transmitted, such as all 1's, thus throwing off the receiver clock phase lock loop). Thus, if NRZ(I) is to be used, an encoding scheme above the media-level must be used so that parity is ensured or maximized and so that clocking information is provided in a timely manner.
Various types of these encoding schemes which "sit on top" of these media-level encoding schemes (in a conceptual layered structure) have been used in the past to take advantage of the media-level encoding schemes strengths. For instance, IBM developed an encoding scheme called the 8/10 encoding scheme where 8 bits of data is encoded into 10-bit words for transmission over the network. (Subsequently, these 10-bit words need to be encoded in one of the media-level encoding scheme, such as NRZ(M) for transmission over the network.) As was discussed, the purpose of this scheme is to guarantee sufficient clocking information while maintaining minimal disparity. However, of the possible 1024 (2.sup.10)10-bit words (to map to the 256 8-bit words), only 252 have zero disparity, i.e., the same number of 1's and 0's. For example, the 10-bit word "1010101010" has zero disparity as it has the same number of 0's and 1's. On the other hand, the 10-bit word "1010101011" has a disparity of two in that there are two more 1's than 0's. Thus, in order to make up for the "missing" four 10-bit words (i.e., 256 required minus 252 available), the 8/10 coding scheme uses some complicated logic and requirements so that disparity is minimized. Furthermore, due to the shear number of possible encoded words, the 8/10 code and logic is quite complex.
Another encoding scheme was developed by Advanced Micro Devices (AMD) for its TAXichip integrated circuits. This is described in Advanced Micro Devices TAXichip Integrated Circuits Technical Manual, Preliminary Rev. 1.2, 1989. This is further described in U.S. Pat. No. 4,987,572, assigned to Advanced Micro Devices. The AMD encoding scheme defines a 4/5 and a 5/6 code. In the 4/5 code, a 4-bit (or "nibble") is received by the chip and encoded into a 5-bit word. Likewise, in the 5/6 code, a 5-bit word is encoded by the chip into a 6-bit word. Again, this is done for the purpose of including clocking information in the data stream while minimizing disparity.
However, using the AMD coding scheme, a disparity of three (i.e., in five bits, four bits are one polarity while the remaining bit is the other) is possible so that DC balance is not maintained. This DC offset can affect the transmitting data integrity as well as increasing jitter. For example, in a 1-byte code transmission, a 40% DC offset can occur due to the disparity.
Furthermore, the AMD coding scheme does not provide a single 5-bit word having "comma" property so that the receiver may synchronize its decoding circuitry. (A word having "comma" property is one where the particular string representing the word can never be inadvertently duplicated in the data stream by, for instance, two words sitting side-by-side, the ending of the first word and the beginning of the second word comprising the particular string.) Instead, two words defined by AMD are required so that the receiver may get in sync.
There is needed a coding scheme which solves the above identified problems by providing an ideal balance between complexity, efficiency, function and performance.