The present invention is generally directed to methods and systems for communication in a data processing network in which data transmission demands between the nodes in the network can cause a reduction in capacity as a result of the retransmission of lost messages. More particularly, the present invention is directed to a system and method for adapting message transmission rates to more closely match the current network capacity. Even more particularly, the present invention employs a message queue together with a message driver which periodically reevaluates the capacity of the network based on a comparison of the number of messages sent versus the number of acknowledgments received.
Some communication methods like UDP (User Data Protocol) are generally considered to be basically “unreliable”. Unlike TCP (Transmission Control Protocol), which is a “reliable” protocol, a UDP message may not ever reach its final destination and it can be dropped or removed by the source node, or by intermediate nodes, or it can be missing anywhere along the communication path. The message can even be silently removed at a destination node without any notification that one of the message packets is missing. (It is noted that the terms “unreliable protocol” and “reliable protocol” are relative terms employed herein to more particularly distinguish two different categories of transmission protocols; the use of these terms is not meant to suggest that one should not use so-called “unreliable protocols”. To the contrary, improvements provided herein make such “unreliable” protocols much more practical by eliminating many of their disadvantages, while still preserving the advantages associated with their lack of complexity and overhead.)
Because of the “unreliable” message delivery qualities associated with simpler protocols, application programs often must themselves implement many features of a transmission protocol—acknowledgment from the other end, time-out, retransmission, etc., so that the application program can determine for itself whether the intended messages are ever delivered. However, simple retransmission often causes more communication traffic which then results in the message drop rate becoming even higher. This is an especially vulnerable time for the network since it is at these times that the communication channel is already likely to be saturated (that is, it is near, at or beyond its capacity).
This problem is greatly amplified when one considers an environment in which there are a large number of distributed data processing nodes. When a distributed application running on one node sends large messages to peer applications running on many different nodes using the UDP protocol, it is very likely that many messages end up as being dropped, which means that they have to be retransmitted. Typically, this retransmission occurs only a short time thereafter, when the network is still saturated with messages. As a result of this situation, it can happen in some cases that an application program running on one of the nodes spends most of its time retransmitting messages rather than performing its other designed-for tasks. As an example, on a heavily loaded large system with more than 500 nodes, if there are a large number of messages which are sent out from one node to the other 500 nodes, it is quite possible that many of the messages will have to be retransmitted several times. Therefore, it is very important to control message flow. One way of accomplishing this, as presented herein, is by regulating the number and size of messages sent and by retransmitting the messages more intelligently.
In sum, there are several problems solved through the use of the present invention. For example, the present invention permits the transmission of bulk messages to many peers without significantly impacting the message drop rate and without causing significant numbers of message retransmissions. This is a particular problem since unintelligent message retransmission methods cause more communication traffic, increase the message drop rate, and slow application performance.
The present invention solves the above problems by providing a method for measuring the condition of the network on a real-time basis to determine how many messages can be delivered in a given period. This method preferably includes counting the number of acknowledgment (ACK) messages returned, especially in comparison to the number of messages sent. The use of this count provides a basis for automatically regulating the communication retransmission rate according to the condition of the communication channel (that is, the number of ACKs received) without requiring any foreknowledge about the communication channels or any knowledge concerning the behavior of any other running application.
Accordingly, applications have several important advantages when the present invention is employed in a data processing network. For example, applications can now send messages over an unreliable communication channel with less overhead and with a reduction in the rate at which messages are dropped. The number of message retransmissions is thus also reduced, and the overall communication performance is enhanced. Message transmission is automatically and substantially continuously adapted to current network conditions. This also means that application programming can be made simpler with the chore of message transmission now being handled more capably by external programming using simpler protocols that relieve the application programs from the chores of acknowledgment monitoring, retry timing and message retransmission. by one or more changes to the switch port configuration.