In a digital transmission network, data from a large number of users are serially transmitted from one network node to another network node, up to their respective final destination. Due to the evolution of networks towards more and more complex mixings of sub-networks with heterogeneous architectures, it is clear that there is a future requirement to support distributed computing applications across high speed backbones that may be carrying LAN traffic, voice, video and traffic among channel-attached hosts and workstations. Perhaps the fundamental challenge for high speed networking is to minimize the processing time within each node in the network.
Packet switching is now commonly used to accommodate the bursty, multiprocess communication found in distributed computing environments. Packets are pieces of data produced by an originating user which are prefixed with headers containing routing information that identifies the originating and destination users. Small computers, called packet switches or nodes, are linked to form a network. Some of these nodes are called end nodes and provide user access to the network. Adapters circuit at each of the switching nodes adapt the packet signals for transmission or delivery to the transmission links and user applications, respectively. Each node examines each header and decides where to send the packet to move it closer to its final destination.
When messages are transmitted and received over telecommunication links, errors can occur because of many sources of noise, e.g., interference between channels, atmospheric conditions, etc.
A method is thus needed to detect when the message received is not the same as the message transmitted. Methods commonly used to detect errors include checksum, parity check, longitudinal redundancy code, and cyclic redundancy code.
In order to minimize the processing time within each node of a high speed network, the end-to-end recovery concept is now used as a result of the much better error rate of the new lines such as optical fiber.
The integrity of the message is ensured by the addition at the end of the message of a Frame Check Sequence (FCS) travelling with the message itself so it can be checked at far end for proper transmission. A Cyclic Redundancy Code (CRC) is employed to generate the FCS at one end and check the entire received message (data plus FCS) at the other end. This is the case of Frame Relay, for instance. Asynchronous Transfer Mode (ATM) even goes further, not only protecting either the entire message or each cell, but also all cell headers that carry routing information.
The standard circuitry for computing the FCS or checking the message is a Linear Feedback Shift Register (LFSR) which carries out a bit by bit multiplication in the Galois Field (GF), i.e., modulo the polynomial on which GF is generated. Each bit of the message is pushed in the LFSR; most significant bit (MSB) first. Division is performed by the feedbacks. At the end of the process, the FCS (the remainder of the division) is within the shift register. This method and type of circuitry is described, for instance, in "Error Correcting Codes" by Peterson and Weldon, The MIT Press, 2nd Edition, 1972. Yet this simple method suffers an obvious drawback since only one bit is processed at each shift while messages may be as long as 8 kBytes and, consequently, 64 k shifts needed. If a 32 bit CRC is used, a 32 position shift register is needed. Computing the CRC takes as many clock pulses as there are bits in the message. It is no longer acceptable in terms of run-time.
To expedite the calculation of the CRC, other methods have been proposed where computation is done on a byte basis. These methods (for example, the method described in "Teleinformatique 1", H. Nussbaumer, Presses Polytechniques Romandes, 1987; or in the article "Byte-wise CRC Calculations" by A. Perez et al., IEEE MICRO, June 1983, pages 40-46) have in common that they anticipate the result after eight shifts have occurred in the LFSR. They all use the same standard 16 bit polynomial X.sup.16 +X.sup.12 +X.sup.5 +1, and although much simpler than the 32 bit polynomial, they require the use of a 256.times.16 look up table and/or a complex hardware assist.