One class of multi-processor data processing systems consists of a plurality of processor nodes connected by an interconnect fabric. Each processor node typically includes a processor and local RAM. A high-speed interconnect fabric supports communication between nodes. A computational problem may be divided between a plurality of the nodes to improve the efficiency of the computation by making use of special resources available at different nodes or to improve the real time that must elapse before a result is available. Hence, a process running on one node may depend on computations being performed at other nodes in the system. The various processes communicate over the interconnect fabric to exchange information and synchronize the processes.
The level of performance of such a system depends on the speed with which messages can be sent from a process running on a first node to a process running on a second node. Each node typically includes an interface circuit which supervises the transmission and reception of messages. In prior art communication systems, the node receiving the message controls whether or not the message is received. When a process on the first node wishes to send a message to a process on the second node, the interface circuit on the first node sends the message over the interconnect fabric and waits for an acknowledgment from the interface circuit at the second node.
When the message reaches the interface circuit on the second node, there are two possibilities, the message is accepted or the message is lost. If the node is too busy or too full to process the message, the message is lost. The recipient may return a message indicating the loss of the message or just remain silent. In either case, the sender must resend the message at some later time. Unfortunately, the sender has no method to determine the optimum time at which to attempt another transmission, since the optimum time requires a knowledge of the processes running on the recipient node, and these processes are not visible to the sender. As a result, the sender is typically programmed to wait for some predetermined time and then attempt another transmission. If the sender initiates the transmission too soon, the recipient may still be busy and the message will again be lost. Each time the message is sent and lost, the efficiency of usage of the interconnect fabric is reduced. This reduced efficiency may cause other processes running on other nodes to run more slowly because these other processes can not access a sufficient bandwidth in the interconnect fabric to run at maximum speed. If the sender waits too long before resending the message, then the processes on the sender may run at less than maximum speed because it is stalled waiting for a return message from the second node containing the results of a delegated task that was the subject of the first message.
Even if the message is accepted by the second node, this prior art method of communication is still inefficient. When a message arrives at the recipient, the interface circuit at the recipient interrupts the processor to inform the processor of the need to deal with an incoming message. If the message is long, it will typically be broken into blocks. The arrival of each block typically generates a separate interrupt. Processing these interrupts can cause a significant reduction in the processor throughput in those cases in which the processor is busy with other tasks. In principle, the interrupts associated with a long message broken into blocks can be avoided by using interface circuits having buffers sufficient to store the largest message. However, such systems still interrupt the processor once per message. In addition, the recipient typically spends a significant amount of processor time determining where incoming messages should be placed in the recipient's local memory and in moving the messages to their destinations in the recipients memory.
This type of prior art system is also inefficient from the sender's point of view. First, each time a process wishes to send a message, it must do so by invoking an operating system call. When multiple processes are running on a single node, there is always the possibility of one process corrupting resources used by another process. Such corruption is possible if one process has unrestricted access to the interface circuit and/or interconnect fabric. In addition, protection must be provided against a process on a first node "attacking" a process on a second node or dominating the interconnect fabric. In the prior art, these protections are typically built into the operating system. On the recipient side of the communication, the operating system tests for permission to write into the recipients memory. On the sender side, the operating system stops one process from masquerading as another process. Unfortunately, operating system calls slow down the communication processes.
Another problem with prior art systems is the vulnerability of a new processor node when it first comes on-line. When a new processor is added to the multiprocessor system, it starts in an unprotected state. During this time, it can become erroneously loaded and effectively disabled. No solution to this problem is available in the prior art.
Yet another problem with prior art interconnect systems is the lack of a method for providing low-latency synchronization of processes using the same interconnect fabric that carries regular messages. It is often important to synchronize processes running on different nodes. For example, there are many applications in which only one process at a time can be given permission to change a data value used by all of the processes. In prior art systems, synchronization is either provided through a separate interconnect fabric or via operating system calls. The former solution significantly increases the cost of interconnect fabric hardware, and the later solution is too slow for many applications.
Broadly, it is the object of the present invention to provide an improved multi-computer interconnect system.
It is a further object of the present invention to provide an interconnect system in which the number of messages lost because the recipient is busy is significantly reduced compared to prior art systems.
It is a still further object of the present invention to provide an interconnect systems in which recipient is not interrupted by incoming messages when it is already busy.
It is yet another object of the present invention to provide an interconnect system in which operating system systems calls are not needed while sending messages to protect each process from accidental or malicious attacks from another process running on the system.
It is a still further object of the present invention to provide an interconnect system which reduces the amount of time a recipient spends determining where to put an incoming message in the recipients memory.
It is yet another object of the present invention to provide an interconnect system that provides an integrated, protected, low-latency synchronization operation that uses the same interconnect fabric as used for regular message traffic.
These and other objects of the present invention will become apparent to those skilled in the art from the following detailed description of the invention and the accompanying drawings.