A node communicating with another node via a communication medium may use multiple transmit queues for buffering frames of data to be transmitted from an output port (for example, an input/output port or outlet) of the node to the other node. Generally, each frame of data is selected to be stored in one of the transmit queues based on some criteria such as type, class or quality of service associated with the frame, or data in the frame. Each transmit queue may receive frames from multiple higher layer virtual entities such as virtual circuits (VCs), virtual local area networks (VLANs), connections, or flows.
In any case, if frames of data are generated or received at the node faster than the frames can be transmitted to the other node, the transmit queue(s) begin to fill up with frames. Generally, recently received frames wait in a queue while frames received ahead of them in the queue are first transmitted, resulting in “head of line” blocking, since frames at the head of a transmit queue block, at least temporarily, other frames in the queue from being transmitted. In addition, frames may be queued at intermediate points between the two nodes, such as at intermediate nodes, or stages in a switched interconnect, in the communication path between the two nodes, thereby encountering the “head of the line” blocking issue at multiple points between the two nodes. The period of time a frame remains in a queue at each node increases the overall period of time it takes for the frame to be transmitted between the nodes. This increase in time taken to transmit a frame from one node to another node in a network setting generally is referred to as network latency.
If a transmit queue in a node fills up and cannot accept any further frames, any additional frames received at the node may be discarded. Typically, an end station node or node at which the frames originate need not discard the additional frames, rather such nodes rely on upper layer protocols and application layer mechanisms to detect congestion and back off for a period of time before generating further frames of data for transmission. An intermediate node in an internetwork, such as a network layer (layer 3) router, however, may need to discard additional frames if a transmit queue therein cannot accept any further frames, since the intermediate node is merely receiving the frames from another node.
Applications executing on the respective nodes may be communicating data with each other and time out, or hang, waiting for the data to arrive from the other node, or detect the loss or absence of data that was discarded and request the data be retransmitted. Latency and retransmission negatively affect throughput and bandwidth of the communication medium over which the nodes communicate.
The approaches discussed above generally do not provide enough transmit queues for non-blocking throughput in a node or a network. One approach is to provide separate transmit queues for related traffic transmitted by a node. A traffic flow may be defined as related frames of data transmitted between two nodes during a communication session between instances of respective applications executing on the nodes. Given there may be multiple instances of multiple applications executing on each node, and multiple sessions between these instances, the number of transmit queues needed for this approach is not easily determined, if not unbounded.
A simple form of controlling flow of frames (“flow control”) between nodes occurs when one or more transmit queues in a node fills with frames to the point the node discards frames that would otherwise be transmitted to another node. Essentially, this type of flow control is binary in manner—either a frame is transmitted or it is not. Another form of flow control involves a node (“the receiving, or destination, node”) that is congested sending a message, for example, a pause frame, to another node (“the transmitting, or source, node”) from which it is receiving frames. The message instructs the transmitting node to stop transmitting frames to the receiving node for a selected short period of time, or until another message is sent from the receiving node instructing the transmitting node to begin transmitting frames again. If this type of flow control is used over each link, there is no need to discard frames within the switched interconnect.
The latter type of flow control is used, for example, between nodes in Ethernet Local Area Networks (LANs) adhering to the Institute for Electrical and Electronic Engineers (IEEE) 802.3 standard for the CSMA/CD (Carrier Sense Multiple Access/Collision Detection) protocol, including Ethernet, operating over Fast Ethernet (100 Mbps), Gigabit Ethernet (1000 Mbps), and 10 Gigabit Ethernet (10,000 Mbps) networks. See IEEE 802.3-2002: IEEE Standard for Information technology—Part 3: CSMA/CD Access Method and Physical Layer Specifications, and IEEE 802.3ae-2002: IEEE Standard for CSMA/CD Access Method and Physical Layer Specifications-Media Access Control (MAC) Parameters, Physical Layer and Management Parameters for 10 Gb/s Operation, for further information on flow control in Ethernet networks.
These flow control techniques do not take into consideration the sources and destinations of flows of traffic that contribute to congestion within a switched interconnect and, therefore, do not specifically flow control only the traffic contributing to the congestion.