The present invention relates generally to multiprocessor systems and, more particularly, to systems and techniques for efficiently checking and generating cyclic redundancy functions in such systems.
As the performance demands on personal computers continue to increase at a meteoric pace, processors have been developed which operate at higher and higher clock speeds. The instruction sets used to control these processors have been pared down (e.g., RISC architecture) to make them more efficient. Processor improvements alone, however, are insufficient to provide the greater bandwidth required by computer users. The other computer subsystems which support the processor, e.g., interconnects, I/O devices and memory devices, must also be designed to operate at higher speeds and support greater bandwidth. In addition to improved performance, cost has always been an issue with computer users. Thus, system designers are faced with the dual challenges of improving performance while remaining competitive on a cost basis.
Early personal computers typically included a central processing unit (CPU), some type of memory and one or more input/output (I/O) devices. These elements were interconnected to share information using what is commonly referred to as a "bus". Physically, buses are fabricated as a common set of wires to which inputs and outputs of several devices are directly connected.
Buses convey data and instructions between the elements of a digital computer. Local buses provide data transmission capability within a device, whereas system buses interconnect devices, such as I/O subsystems, memory subsystems and a central processor, together. In many systems, several devices compete for use of the system bus. In industry parlance, devices which can control the system bus are termed bus masters, while other devices, which are passive and respond to requests from the bus masters, are termed slaves. Some devices may operate at different times either as a slave or a bus master to accomplish different objectives.
The advent of multiprocessor architectures for personal computers is a recent trend in the design of these systems, intended to satisfy consumers' demand for ever faster and more powerful personal computers. In a typical multiprocessor computer system each of the processors may share one or more resources. Note, for example, the multiprocessor system depicted in FIG. 1. Therein, an exemplary multiprocessor system 5 is illustrated having seven nodes including a first CPU 10, a bridge 12 for connecting the system 5 to other I/O devices 13, first and second memory devices 14 and 16, a frame buffer 18 for supplying information to a monitor, a direct memory access (DMA) device 20 for communicating with a storage device or a network and a second CPU 22 having an SRAM device 24 connected thereto. According to the conventional paradigm, these nodes would be interconnected by a bus 26. Caches can be provided as shown to isolate some of the devices from the bus and to merge plural, small bus accesses into larger, cache-line sized accesses.
As multiprocessor systems grow more complex, i.e., are designed with more and more nodes, adapting the bus-type interconnect to handle the increased complexity becomes problematic. For example, capacitive loading associated with the conductive traces on the motherboard which form the bus becomes a limiting factor with respect to the speed at which the bus can be driven. Thus, an alternative interconnect architecture is desirable.
One type of proposed interconnect architecture for multiprocessor personal computer systems replaces the bus with a plurality of unidirectional point-to-point links and uses packet data techniques to transfer information. FIGS. 2(a) and 2(b) conceptualize the difference. Ringlets overcome the aforementioned drawback of conventional bus-type interconnects since their individual links can be clocked at high speeds regardless of the total number of nodes which are linked together. FIG. 2(a) depicts four of the nodes from FIG. 1 interconnected via a conventional bus. FIG. 2(b) illustrates the same four nodes interconnected via unidirectional point-to-point links 30, 32, 34 and 36. These links can be used to provide bus-like functionality by connecting the links into a ring (which structure is sometimes referred to herein as a "ringlet") and having each node pass-through packets addressed to other nodes. As will be appreciated by those skilled in the art, packets are formatted to include payload data as well as various overhead information including, for example, information associated with the target and source nodes' addresses.
An exemplary packet format is illustrated in FIG. 2(c). Therein, the packet includes a target node identification field (Target ID), an old bit field, a command field, a source node identification field (Source ID), payload data and a cyclic redundancy check (CRC) field. Between the data packets, the system circulates filler data referred to herein as out-of-band information or idle symbols.
CRC techniques are well known to those skilled in the art for use in detecting errors in transmitted data. Upon creation of a packet at a node, a CRC generator take the message block, e.g., all of the fields illustrated in FIG. 2(c) except for the CRC field, and divide the block by some predetermined binary number. The resulting quotient forms one or more check characters, which are then appended as the CRC field. Upon receipt of the packet at another node, a CRC checking function can perform the same operation and compare the resulting quotient to the value of the appended CRC field to determine if any transmission errors have corrupted the packet.
As seen in FIG. 2(d), each node in a ringlet (e.g., CPU 10) can be conceptualized as having strip component 38 for receiving incoming packets and send component 39 for handling the transmission of packets. Both the strip and send components require CRC hardware for checking CRCs on received packets and generating CRCs on transmitted packets. In fact, traditional design techniques would provide four CRC hardware units for each node, i.e., one unit in each of the strip and send components to check CRCs and one unit in each of the strip and send components to generate CRCs.
However, CRC checking/generating hardware is relatively expensive and requires a large number of gates. Currently, 32 bit CRC fields are typically used in these types of applications to provide error checking, however, it is likely that as the processing power of computing systems increases, so will the packet size and the size of the CRC fields. Accordingly, it would be desirable to have a more efficient scheme for handling CRC checking/generation that reduces the amount of hardware required for this functionality in each node.