The present invention relates to network communications and more particularly to congestion control in network communications.
In conventional networks, the rate of communications between network nodes, such as network switches, may be controlled so as to avoid congestion within the network which may reduce network throughput to unacceptable levels. Typically, congestion control mechanisms measure queue occupancy at a network node and utilize one or more occupancy thresholds which, if exceeded, can result in reducing the rate at which communications are received by a network node. Such congestion mechanisms have typically reduced data rates into the node experiencing congestion by pausing communications on a link into the node. When the incoming communications for a link have been paused, the outgoing communications may continue so as to reduce the queue occupancy at the node and, thereby, remove the congestion condition.
For example, if a network switch having N inputs and at least one output receives communications for the output from three of the N inputs, then, if a congestion condition is detected for the output, one or more of the three links would typically be paused to alleviate the congestion condition. In a link level congestion control system, the link associated with the input port which received the communication which resulted in the congestion condition would typically be paused for a predefined time period. One such link level congestion control system is provided by the Institute of Electrical and Electronics Engineers (IEEE) 802.3x specification which provides flow control through a pause message.
While link level flow control may avoid performance degrading congestion, such congestion control may also unnecessarily reduce throughput as pausing an entire link may overly restrict throughput. Accordingly, a need exists for improvements in network switches which may allow for improvements in congestion control in communications between network switches.
In light of the above discussion, the present invention may provide a network switch as well as methods, systems and computer program products for controlling congestion at a granularity of less than a link. Such finer granularity may be provided by pausing traffic at a source port level of a network switch. The network switch which transmitted a message which resulted in congestion being detected is notified of the congestion and pauses the communications from the source port of the message while maintaining communications over the link from other source ports. Such source port level congestion control may be provided by a network switch having a sub-queue of its output queues where each sub-queue corresponds to an input port. Source port level pausing of transmissions may then be provided by pausing the sub-queue associated with a source port.
In a particular embodiment of the present invention, flow control through a first network switch having a plurality of input ports and a plurality of output ports may be provided by receiving a message for transmission from an input port of the first network switch to an output port of the first network switch and determining if the received message results in an indication of congestion associated with transmitting the received message onto the output port. The transmission of messages received at a first input port on a second network switch which transmitted the message may then be paused while the second network switch continues to transmit messages received at input ports of the second network switch other than the first input port if the received message results in an indication of congestion.
In a further embodiment of the present invention, the transmission of messages received at a first input port on a second network switch may be paused by transmitting a pause message from the first network switch to the second network switch which specifies a source and/or destination address of the received message. The transmission of messages from the input port of the second network device associated with the source and/or destination address specified in the pause message may then be paused while continuing to transmit messages to the first network switch from input ports of the second network switch which are not associated with the source and/or destination address specified in the pause message.
In still another embodiment of the present invention, an output queue of the second network switch is associated with the output port over which the message received by the first network switch was transmitted. In such an embodiment, transmission of messages from the input port of the second network switch associated with the source and/or destination address specified in the pause message may be accomplished by dividing the output queue of the second network switch into a plurality of sub-queues, wherein each sub-queue corresponds to one of a plurality of input ports which receive messages for transmission on the output port of the second network switch. Messages received for transmission on the output port of the second network switch are stored in one of the plurality of sub-queues based on the input port from which the message was received. The sub-queue corresponding to the source and/or destination address specified in the pause message may then be determined so as to determine a paused sub-queue and transmission of messages in the paused sub-queue on the output port of the second network switch may be paused while continuing to transmit messages from sub-queues of the output port which are not paused.
In a still further embodiment of the present invention, the sub-queue corresponding to the source and/or destination address specified in the pause message may be determined by establishing a look-up table which relates source and/or destination addresses to source ports of the second network switch. The source and/or destination address associated with the input port of the second network switch to be paused may be extracted from the pause message and the input port to be paused identified utilizing the extracted source and/or destination address and the look-up table.
In specific embodiments of the present invention, the pause message specifies a duration during which messages from the input port of the second network device are paused.
Alternatively, the pause message may specify that messages from the input port of the second network device are paused until a resume message is received by the second network device. In such an embodiment, it may be determined if the congestion condition no longer exists. If so, a resume message may be sent from the first network switch to the second network switch so as to resume transmission of messages from the input port of the second network switch to the first network switch.
In a specific embodiment of the present invention, the pause message is an IEEE 802.3x link level flow control message. Furthermore, the link level flow control message may be a pause frame. In such an embodiment, the pad portion of the pause frame may include at least one of a source and a destination address.
A network switch according to the present invention includes a plurality of input ports and at least one output port. An output queue associated with the output port receives data from the plurality of input ports and provides data to the output port and a plurality of sub-queues of the output queue are also provided. Each of the plurality of sub-queues is associated with a respective one of the plurality of input ports so as to receive data from the associated respective one of the plurality of input ports.
The plurality of sub-queues may be configured so as to be separately paused so that one of sub-queues may be prevented from providing data to the output port irrespective of whether others of the plurality of sub-queues provide data to the output port. The plurality of sub-queues may also be configured so as to receive data from their respective input port irrespective of whether the sub-queue is paused. Furthermore, the output port may be configured so as to receive pause messages which specify an input port and further configured to pause a corresponding one of the sub-queues associated with an input port specified in a received pause message.
While the invention has been described above primarily with respect to the method aspects of the invention, both systems and/or computer program products are also provided.