The present invention relates generally to multi-processor computer systems. More specifically, the present invention provides techniques for sending signals between clusters of computer systems having a plurality of multi-processor clusters.
A relatively new approach to the design of multi-processor systems replaces broadcast communication such as bus or ring architectures among processors with a point-to-point data transfer mechanism in which the processors communicate similarly to network nodes in a tightly-coupled computing system. That is, the processors are interconnected via a plurality of communication links and requests are transferred among the processors over the communication links according to routing tables associated with each processor. The intent is to increase the amount of information transmitted within a multi-processor platform per unit time.
In some multi-processor systems, local nodes (including processors and an interconnection controller) are directly connected to each other through a plurality of point-to-point intra-cluster links to form a cluster of processors. Separate clusters of processors can be connected via point-to-point inter-cluster links. The point-to-point links significantly increase the bandwidth for coprocessing and multiprocessing functions. However, using a point-to-point architecture to connect multiple processors in a multiple cluster system presents its own problems.
One limitation associated with such an architecture is caused by the inter-cluster links used to transmit signals between clusters. Inter-cluster links are relatively longer than intra-cluster links. Inter-cluster links pick up relatively more noise and therefore tend to include more errors than signals transmitted on intra-cluster links. In addition, the extra length of inter-cluster links exacerbates the effect of skew between bit lanes. A normal intra-cluster link initialization sequence essentially indicates the device types at each end of a link and when each device will start sending data. Accordingly, this initialization sequence does not provide sufficient information to calibrate the linked devices for de-skewing. Consequently, skew and error detection and correction methods that may be acceptable for intra-cluster links are not always adequate for inter-cluster links.
In addition, a typical intra-cluster protocol causes a local transmitter module to generate “no operation” (“NOP”) packets to be sent on a link when no valid data or control packet needs to be sent. Some of these NOP packets accumulate in a buffer of packets awaiting transmission. Therefore, when a valid packet needs to be transmitted after NOP packets have accumulated in the buffer, transmission of the valid packet will be delayed until the NOP packets have been sent.
This problem may be exacerbated by differences in clock speed of various components within a node or a cluster. For example, packets may be sent within a node (e.g., between a protocol engine and the local transmitter module within a processor) according to an internal format, may be transmitted between nodes according to an intra-cluster format and may be transmitted between clusters according to an inter-cluster format. If the clock speed for devices communicating via the internal format (e.g., the transmitter module) is faster than the clock speed of devices communicating via the intra-cluster or inter-cluster protocol (e.g., of a macro for transmitting packets between nodes or clusters), then valid packets may tend to arrive in the buffer faster than they can be transmitted even during normal operation.
Therefore, it would be desirable to provide techniques for improving reliability and reducing latency of inter-cluster communication in systems having multiple clusters of multiple processors connected using point-to-point links.