1. Field of the Invention
The present invention relates generally to computer communication protocols, and more specifically to a credit-based message protocol in a multi-processor computer system.
2. Discussion of Background Art
Multi-processor computer systems are made up of multiple processor nodes communicating over a high-speed interconnection network. Each processor node typically includes a processor and local Random Access Memory (RAM). Computational tasks are divided among processor nodes to maximize utilization of resources available at different processor nodes. Dividing a task among processor nodes can reduce the time needed to produce a result, but implies that one part of a task being processed by one node may depend on the results of another part of the tasked being processed by another processor node. The various sub-tasks must exchange information relevant to their processing, and synchronize their processing, via the network.
Different methods of communication exist. The shared-memory method of communication is very fast because each processor can simply read what has been written by other processors. However, in this method the critical memory areas used for communications by one sub-task are not protected from being overwritten by another sub-task. In a message-passing model, on the other hand, each processor can only access its own memory and can only communicate with other processors by explicitly composing a message and sending it to other processors. This model protects communications because one processor cannot write to another processor""s memory.
In either of these methods of communication, when a first processor node sends a message to a second processor node, the first node waits for an acknowledgement from the second node. When the message reaches the second node, there are three possibilities: the message is accepted, the message is lost, or the message is blocked. If the receiving processor node is too busy or too full to process the message, the message may be lost. The receiving processor may return a message indicating the loss of the message or it may remain silent. If the message is blocked, then the communication interface clogs the system and clogging propagates backward, congesting the network and/or harming the system.
An efficient way to receive messages allows incoming messages to be written to a receiving node memory buffer that is shared between all senders that may communicate with the receiver. If any particular sender continually sends messages to the receiver, for example, due to a software or hardware error in the sender, that sender can over-run the receiver and fill up the shared buffer. Then, additional incoming messages may be discarded or blocked in the network. In either case, the flood of erroneous messages would interfere with the processing of legitimate messages from other nodes.
What is needed, therefore, is a message passing system that overcomes the above-discussed deficiencies.
The present invention provides a credit-based mechanism to limit the maximum number of packets a node can receive from another node in a multi-processor node computer system. The invention includes a buffer pool and a credit mechanism in each node wherein the buffer temporarily stores incoming packets sent by other nodes. The credit mechanism allocates a predetermined number of packets which a node can receive from another node so that no sending node can use more than its allocated share of the buffer pool, and thus assures that the buffer pool will not overflow. Even though a node can continue to transmit unwelcome packets, the packets are not written into the packet buffer pool, and are thus discarded. Because the packet buffer does not overflow, the receiving node can continue to communicate with other nodes.