The present invention relates to communications networks and, more particularly, to communications networks in which a plurality of processors each controls a plurality of tasks. Each task in the network is enabled to call upon any other task in the network for data or computation support.
Each processor, with its input/output buffers and tasks is called a node. Nodes are connected together over a network such as, for example, a bu or star network. A task requesting data or computation is called an invoker task. A task to which a request is directed is called a receiver task. When receiver and invoker tasks are associated with different nodes, the request is transmitted over the local area network. The network, whose arrangement is not of concern to the present invention, is controlled by software giving all nodes and tasks equal or fair access to the network.
The type of system with which the present invention is particularly concerned employs sporadic communications from and to each processor over the local area network. That is, a typical transaction requires a short invoking message from the invoker to invoke a task and a short reply message from the receiver node to convey the response from the receiver task back to the invoker task. Such short messages are conventionally transmitted in short bursts or packets containing a header identifying the invoker task and the destination task and the data. In the ideal situation, a complete transaction would require only the two messages, invoke and reply, to be transmitted over the network. In order to accomplish this, the invoker node would broadcast the invoking message packet for interception and execution by the receiver node. Similarly, the receiver node would broadcast the reply message packet when the requested task was completed. It is recognized that messages transmitted over a network are subject to hardware and software errors and to noise which partially or totally correct the data contained in the messages. If no mechanism is provided to account for lost, misdirected or corrupted messages, the invoker task has no way to determine that there is a problem. Thus, the invoker task could continue with activities that are inconsistent with accommodating hardware or software faults or with the results, or non-results, of noise-corrupted data. Accordingly, simple broadcast of invoking and reply messages can not provide satisfactory reliability except for messages whose loss or corruption would not affect important data or computation.
One conventional technique, called a rendezvous session-level protocol, ensures proper reception of data packets on a local area network. The rendezvous protocol includes transmitting an acknowledge data packet (ACK) from the receiver node directed to the invoker node confirming the correct receipt of the invoking data packet and confirming the existence of a task whose task identifier corresponds with that contained in the invoking data packet. This acknowledge data packet is transmitted only after the receiving node places the data in the invoking data packet in the input queue of the receiver task and determine that the receiver task exists and is alive and well.
If the task requires a substantial time for completion by the receiver task, it is conventional for the invoking node to transmit a query data packet to the receiver node after a predetermined time delay to verify that the receiver take continues in good health and continues to work on the invoked task, or that the invoked task is properly in the receiver node's input queue or is being worked on but is not yet completed. In response to the query data packet, the receiver node, after checking the condition of the receiver task and the invoked task, transmits query acknowledge data packet to the invoker node. In some instances, the time for the receiving task to complete the assigned work may result in the invoking node transmitting one or more query data packets before the invoked task is completed. Each query data packet is followed by an acknowledge data packet.
When the receiver task completes the invoked task, it transmits a reply data packet through the receiver node to the invoker node. The reply data packet contains the requested data or computation. In response to correct reception of the reply data packet, the invoker node transmits a reply acknowledge data packet directed to the receiver node to confirm correct receipt of the reply data packet.
It will be noted that each data packet between the invoker and receiver nodes is followed by an acknowledge data packet in the reverse direction. Thus, the transmission of acknowledge data packets doubles the number of messages on the local area network. This increased message overhead has the undesirable effect of increasing the amount of time consumed by the processors of the network in assembling and disassembling the extra data packets, as well as increasing the consumption network bandwidth beyond that which would be required in a system capable of omitting the acknowledge data packets.
In addition to these problems, the conventional rendezvous protocol has a built-in positive feedback effect wherein an overloaded condition at a receiver node results in a catatonic condition without means either to work off the overload or apply back pressure on invoking nodes to reduce further requests for service. An overload condition frequently shows up as a quantity of data in an input buffer of the receiver node exceeding the buffer capacity. In this situation, some systems discard all invoking data packets from the input buffer of a catatonic node in order to permit resumption of activity. When this is done, the invoking nodes frequently have no way of knowing that their invoking requests have been discarded until elapse of a relatively long delay time without receiving a reply message is inferred to be caused by discard of invoking data packets. The relatively long delay retards system throughput