This application relates to message processing systems, and more particularly, to techniques for identifying and minimizing congestion in a tightly coupled message processing system interconnected by a blocking network.
Network traffic in a message processing system frequently tends to be distributed in an uneven manner resulting in a significant amount of traffic going through a small subset of the network. This imbalance in load is referred to as xe2x80x9cnetwork congestion.xe2x80x9d Traditionally, network components (such as routers) react to congestion by dropping packets. The communication protocol stacks react to dropped packets by reducing the traffic rate so that the load through the congested portion of the network drops to an acceptable rate. Such a network is referred to as a packet dropping network.
In high speed networks (for example, certain switching networks) routers do not drop packets in the presence of congestion. Instead, back-pressure is applied to upstream routers of the network. In other words, routers block traffic instead of dropping packets. The design point of blocking versus dropping is chosen to ensure that the communication protocol stacks can operate faster since the stacks can assume that all injected packets get to their destination with very high probability. These tightly coupled networks are referred to as blocking networks.
Blocking networks pose a problem with respect to congestion avoidance because the communication protocol does not receive an indication of traffic congestion (i.e., lost packets). Instead, congestion results in increased transit times. Traditional network protocols like TCP/IP react to increased transit times by increasing the amount of network traffic because they interpret increased transit time as a longer network pipeline.
In blocking networks, the traditional approach to congestion has been to simply live with the problem, or to partition the network into non-overlapping sections so that congestion in one partition does not effect other partitions. Another scheme is to use virtual channels to create the impression of multiple independent networks, and to contain the congestion within each virtual channel plane. The partitioning method is difficult to apply when different applications need to communicate between the same set of nodes, or the same application needs to communicate with different node sets. The virtual channel approach needs additional hardware resources in the routers, and is therefore limited in scope.
It is desirable to avoid,congestion in blocking networks because network congestion results in a significant drop in performance for all users of the network. Specifically, when congestion occurs, all network traffic that has to transit through the congested region can be slowed down drastically. If different traffic types share the same blocking network, congestion caused by one traffic type can effect other traffic types, which is clearly undesirable.
Thus presented herein, in one aspect, is a method for monitoring congestion in a network, such as a blocking network. The method includes: sending a packet into the network, and associated therewith recording a first time stamp; receiving an acknowledgment back across the network responsive to the sending of the packet, and recording a second time stamp with receipt of the acknowledgment; determining a round-trip time of the packet and acknowledgment across the network using the first time stamp and the second time stamp, wherein the round-trip time is indicative of an amount of congestion in the network; and estimating the amount of congestion in the network using the determined round-trip time, wherein the estimating comprises comparing the determined round-trip time with a predetermined round-trip time representative of no network congestion or a known degree of network congestion.
In another aspect, a method for ameliorating congestion in a network is provided. This method includes: identifying when the network is congested between a sender node and a destination node; subsequent to the identifying, ascertaining a round-trip time of a packet and a corresponding return acknowledgment sent between the sender node and the destination node across the network; and varying a number of flow control tokens at the sender node for the destination node using the ascertained round-trip time.
In still another aspect, a system for monitoring congestion in a network is provided. This system includes means for sending a packet into the network and for recording a first time stamp based thereon, as well as means for receiving an acknowledgment back across the network responsive to the packet, and for recording a second time stamp with receipt of the acknowledgment. The system further includes: means for determining a round-trip time of the packet and the acknowledgment across the network using the first time stamp and the second time stamp, wherein the round-trip time is indicative of an amount of congestion in the network; and means for estimating the amount of congestion in the network using the determined round-trip time, wherein the means for estimating comprises means for comparing the determined round-trip time with a predetermined round-trip time representative of no network congestion or a known amount of network congestion.
In yet another aspect, a system for ameliorating congestion in a network is provided. The system includes means for identifying when the network is congested between a sender node and a destination node, and means for ascertaining a round-trip time of a packet and a corresponding return acknowledgment sent between the sender node and the destination node across the network subsequent to identifying that the network is congested. The system further includes means for varying the number of flow control tokens at the sender node for the destination node using the ascertained round-trip time.
In a further aspect, the invention comprises at least one program storage device readable by a machine, tangibly embodying at least one program of instructions executable by the machine to perform a method of monitoring congestion in a network. The method includes: sending a packet into the network, and recording a first time stamp based thereon; receiving an acknowledgment back across the network responsive to the packet, and recording a second time stamp with receipt of the acknowledgment; determining a round-trip time of the packet and acknowledgment across the network using the first time stamp and the second time stamp, wherein the round-trip time is indicative of an amount of congestion in the network; and estimating the amount of congestion in the network using the determined round-trip time, wherein the estimating comprises comparing the determined round-trip time with a predetermined round-trip time representative of no network congestion or a known degree of network congestion.
In a still further aspect, the invention includes at least one program storage device readable by a machine, tangibly embodying at least one program of instructions executable by the machine to perform a method of ameliorating congestion in a network. The method includes: identifying when the network is congested between a sender node and a destination node; subsequent to the identifying, ascertaining a round-trip time of a packet and corresponding return acknowledgment sent between the sender node and the destination node across the network; and varying the number of flow control tokens at the sender node for the destination node using the ascertained round-trip time.
To restate, presented herein is an approach for monitoring and minimizing congestion in a tightly coupled (i.e., blocking) network. The approach comprises a protocol-based approach that can be readily implemented on existing blocking networks. Traditional congestion avoidance schemes based on lost packets are not applicable to blocking networks since blocking networks do not drop packets in the presence of congestion. Rather, the approach presented herein relies upon an observed round-trip timed to detect and avoid congestion. Thus, there is no overhead for lost packets and retransmission of lost packets, in order to detect and avoid or recover from congestion.
Advantageously, the approach presented herein ameliorates network congestion without the need to partition the network into subsections. Further, if a blocking network is not partitioned, and is shared by multiple applications, a single application congesting the network could adversely effect other applications using the network at that time. The approach presented herein prevents this from happening by alleviating an application from congesting the network for other applications.
In a normal case, where there is no network congestion, the technique presented herein imposes no protocol overhead in terms of taking time stamps, etc. It is only when a sending processor runs out of flow control tokens (i.e., upon initiation of congestion), that the extra cost of taking time stamps and measuring the round-trip time is performed. The packets sent across the network do not need to carry any additional data to implement the congestion avoidance scheme of this invention. The only requirement is that there be an ability to match up the packet and its acknowledgment, which is usually inherent in most existing communication protocols in the sending side state. All time-stamping and round-trip time measurements are local to a sending node. Thus, payload per packet is not effected. Further, there is no need for any synchronizing of clocks between senders and receivers. The clocks can be out of sync by any amount since time intervals are only measured relative to one clock at the sender. The approach presented herein is tunable to a wide variety of network sizes and a wide variety of network latencies by appropriately setting the time delay and token count parameters. Further, it is only necessary to tune once per closely-coupled network configuration. The approach presented herein does not require any coordination between applications using the shared network to detect and avoid congestion. State maintenance to implement the scheme presented is local to each application, thereby making implementation simpler and more efficient.