This Application relates to message processing systems. More particularly, this Application relates to methods and systems, including protocols and buffering, for facilitating the transmission of messages from a source node to a destination node in a message processing system.
Message processing systems, for example, the multiprocessor data processing system 10 depicted in FIG. 1, require reliable message communication paths between respective ones of the processors 121 . . . 12j. The exemplary system 10 of FIG. 1 employs an exemplary communication medium or switch network 20 commonly coupled to the processors 12. The processors may require respective communication adapters 141 . . . 14j to control communications between each processor 12 and the medium 20 via respective connections 161 . . . 16j. Communication between, for example, software application(s) executing on the processors 12 of system 10 can thus be provided via medium 20. Storage medium 22 may be employed in the system to hold the applications, associated data, etc.
Because respective processors may be supporting different, asynchronous application software partitions, asynchronous messaging becomes a useful form of communication between the processors. For example, messages may require transmission from a xe2x80x9csourcexe2x80x9d node (e.g., processor 121) to a xe2x80x9cdestinationxe2x80x9d node (e.g., processor 12j).
Random delays may be experienced in medium 20 by individual messages sent from a source node to a destination node, therefore, the destination node may be required to receive messages in an order different from the order in which they were transmitted from the source node. The destination node, to accommodate this requirement, may provide buffers to hold incoming, unordered messages. The messages can then be retrieved from the buffers and processed in their proper order. This is illustrated in FIG. 2, which is a hybrid hardware/software diagram of a message processing system like that of FIG. 1 and which depicts a message source node 18, and a message destination node 18j. (The term xe2x80x9cnodexe2x80x9d is used broadly herein to connote any identifiable combination of hardware and/or software to or from which messages are passed.) Source node 181 has allocated therein send message buffers 30 within which are placed messages M(1), M(2) and M(3) which, for application reasons, are required to be sent through send message processing 32, across medium 20, to destination node 18j.
As discussed above, random delays in medium 20 may cause messages M(1), M(2) and M(3) to arrive at destination node 18j out of order. To accommodate out of order receipt of messages, destination node 18j, in anticipation of the arrival of messages from various sources in the system, can allocate or post receive buffers 40. In the example of FIG. 2, buffer B1 holds the first arriving message M(2), buffer B2 holds the second arriving message M(1) and buffer B3 holds the third arriving message M(3). In this example, message M(2) has arrived before message M(1). However, to properly order the messages, receive message processing 42 can simply remove message M(1) from its buffer first (thereby reordering the messages) and can then pass the messages in their proper order to receive processing 44 (e.g., the application software executing at the destination node).
Those skilled in the art will understand that message ordering in a system can be imposed by using a particular protocol, e.g., messages sent from a particular source to a particular destination may be sequentially numbered and the sequential numbers can be transmitted with the messages so that the destination node can properly reorder the messages.
The process of allocating or posting receive buffers 40 in destination node 18j is often a dynamic one, and if more messages are arriving than there are buffers posted, buffer overrun can occur. To avoid buffer overrun at the destination node, it is common to 1) adopt a convention wherein the destination node automatically discards packets assuming that the source node will retransmit them after a timeout, or 2) adopt a rendezvous protocol when the message lengths are larger than some threshold. A rendezvous protocol, as discussed further below, involves the transmission from the source node of a control information packet relating to a message to be sent from the source node to the destination node. The control information often includes an indication of the length of the entire data portion of the message to be sent. When a buffer of adequate length is allocated or posted at the destination node, an acknowledgement packet transmission (e.g., xe2x80x9cREADY TO RECEIVExe2x80x9d) is sent from the destination node to the source node, and the source node can thereafter reliably send the entire message to the destination node. In conventional rendezvous protocols, this initial exchange of the control information and acknowledgement packets results in a loss of performance for messages longer than the threshold because two packets are now required to be exchanged between the source and destination nodes before any actual message data can be exchanged.
What is required, therefore, is a method, system, and associated program code and data structures, which prevent the performance degradation associated with packet retransmission after timeouts, or with standard rendezvous protocols in which an exchange of packets between source and destination nodes occurs before any actual message data is exchanged.
The shortcomings of the prior approaches are overcome by the present invention, which relates to methods, systems, protocols and buffering for facilitating the efficient transmission of messages from a source node to a destination node in a message processing system. An optimistic, eager rendezvous transmission mode is disclosed wherein first data portions of messages are transmitted from a source node to a destination node along with the initial control information packets. By employing early arrival buffering at the destination node, the source node can reliably send the first data portions of the messages to the destination node along with the control information, knowing that the first data portions will be reliably stored in either early arrival buffering or posted receive buffering.
In one particular aspect, the present invention is a method for transmitting at least one message from a source node to a destination node, the message including a first data portion and a second data portion. The method includes providing, at the destination node, first, early arrival buffering to reliably store the first data portion of the message. The first data portion of the message is transmitted, along with control information relating to the first message, from the source node to the destination node. The destination node stores the first data portion of the message in the provided early arrival buffering, and the source node thereafter waits for an acknowledgement pertaining to the first message from the destination node before transmitting any remaining data portions of the first message.
The destination node determines whether it can receive the remaining data portions of the first message, e.g., whether adequate receive buffering is posted, in response to receiving the control information relating to the first message. In response to an eventual determination that the destination node can receive the remaining data portions of the first message, the destination node transmits the acknowledgement pertaining to the first message to the source node, and the source node transmits the second data portion of the first message in response to receiving the acknowledgement.
In another aspect, the present invention provides a combined rendezvous mode message transmission method for a message processing system, including alternating between rendezvous transmission modes as a function of the amount of free space in the early arrival buffering. In this aspect of the invention, a method for transmitting a plurality of messages from a source node to a destination node is provided, including providing, at the destination node, early arrival buffering of adequate size to hold respective first data portions of a given number xe2x80x9cQxe2x80x9d of the plurality of messages. The method further includes alternating between using a first rendezvous transmission mode and a second rendezvous transmission mode as a function of the amount of free space in the early arrival buffering, wherein:
the first rendezvous transmission mode comprises transmitting first data portions and control information for respective first mode messages of the plurality of messages, and awaiting respective acknowledgements before sending any remaining data portions of the first mode messages, and
the second rendezvous transmission mode comprises transmitting control information for respective second mode messages of the plurality of messages, and awaiting respective acknowledgements before sending any respective data portions of the second mode messages.
The first rendezvous transmission mode is used when there is a sufficient amount of free space in the early arrival buffering to hold first data portions of the messages to be transmitted. The second rendezvous transmission mode is used when there is an insufficient amount of free space in the early arrival buffering to hold the first data portions of the messages to be transmitted.
The amount of free space in the buffering is determined at the source node based on the given number xe2x80x9cQxe2x80x9d of the plurality of messages for which adequate space is provided at the destination node, and the number of first mode messages for which first data portions and control information have been transmitted and for which acknowledgements have not been received from the destination node.
In an enhanced embodiment, the system may include a plurality of message destination nodes, and the source node independently alternates between using the first rendezvous transmission mode and the second rendezvous transmission mode for each destination node of any of the plurality of the message destination nodes to which messages are transmitted from the source node, as a function of the amount of buffering currently available at each destination node.
The invention also includes a memory including the early arrival buffer structure to reliably hold the first data portions of the messages, and also includes a system having the processing resources at the source node and the destination node, including receive message processing resources and the pre-allocated early arrival buffer buffering at the destination node, to implement the transmission modes discussed above.
The optimistic eager rendezvous transmission mode of the present invention is recommended for systems that can operate under the general assumption that receive buffering is usually posted at the message destination node, and therefore, when employed, avoids the delays associated with message packet retransmission after time-outs, or the initial exchange of dataless transmissions characteristic of standard rendezvous modes. The enhanced, combined transmission mode, provides the above-described advantages of the eager rendezvous mode but also allows for a reasonable limit to be imposed on the amount of early arrival buffering in the system.