FIG. 1 shows a block diagram of a conventional Ethernet switching system. As shown, the Ethernet system comprises backplane switches 101 and 102, communicating with each other via a trunk line 103. The Ethernet system also comprises a plurality of line cards, including line cards 104, 105, and 106. Each of the line cards includes a switch, such as a switch 1041 in the line card 104, a switch 1051 in the line card 105, and a switch 1061 in the line card 106. Each of the switches communicates with a backplane switch (either of backplane switches 101 or 102). As a result, the line cards communicate with each other through the switches 104, 105 and 106 and the backplane switches 101 and 102.
In the line card 104, CPUs 1042 and 1043 communicate with each other via a network interface 1045, the switch 1041, and a network interface 1044. In the line card 105, CPUs 1052 and 1053 communicate with each other via a network interface 1055, the switch 1051, and a network interface 1054. In the line card 106, CPUs 1062 and 1063 communicate with each other via a network interface 1065, the switch 1061, and a network interface 1064. A CPU and a network interface may be connected over a bus (e.g. a PCI Express bus), while other lines in the system are Ethernet connections.
It should be noted that the network interface functionality within blocks 1044, 1045, 1054, 1055, 1064 and 1065 may be implemented in any number of ways, whether as a chip, a portion of a chip, a card, or a portion of a card.
An Ethernet switch has information about its own ports, so that the switch can receive a packet and switch it over to the right port by examining the content of the packet and component information inside the switch.
A traffic flow may, for example, proceed from the CPU 1063 in the line card 106 to the CPU 1053 in the line card 105 via the switch 1061, the backplane switches 101 and 102, and the switch 1051. Other traffic flow may proceed from the CPU 1052 in the line card 105 to the CPU 1053 in the same line card via the switch 1051. If these two traffic flows try to exit the same egress port of the switch 1051, congestion can occur.
In the conventional Ethernet system, information passed between the network interface 1054 and the switch 1051 is traffic flow only. There is no information exchanged between the conventional switches indicating that there is congestion on a port or a specific receive queue of the network interface, and that certain packets are going to be dropped by the network interface because of the congestion. If there is congestion, a switch usually would just drop the packets. The problem of recovering the packet drops is then handled by higher level software running on both sides of the network, i.e., the transmitter and receiver, which detect dropped frames and request retransmission. The protocol that is usually used for this purpose is TCP/IP. The only standard way of avoiding drops would be to employ IEEE 802.3x flow control. However, that flow control causes blocking in the network. As a result, the slowest link would degrade the performance of the entire network.
Usually, a switch uses several priority queues on the ingress side of a network interface, employing a classification mechanism to decide how to classify packets on the link and which priority queue a packet should go to. The packet is then received by the network interface, which employs an independent classification mechanism in assigning the packets to a certain queue inside the CPU memory. The CPU provides the network interface with resources in the CPU memory. The network interface usually supports several DMA queues that take the packets received from the network, classify them into receiving DMA queues and put them in the CPU memory. Each DMA queue is serviced by the CPU with a certain amount of buffer memory which is managed dynamically by the CPU and the DMA as packets are being received and consumed by the CPU. The CPU allocates CPU time between the queues according to a predetermined policy. For example, queues of control data may have high priority, and thus other priority queues may get congested and their receiving (RX) DMAs will run out of buffer capacity, and will be forced to drop packets that keep coming from the network (i.e. from the switch). The switch does not know what the network interface and the CPU are going to do with the traffic flow from the switch.
For example, the switch 1051 has two input traffic flows: the first one is the one from the CPU 1063, and the second one is the one from the CPU 1052. As an example, the switch 1051 may send to a destination, e.g., the CPU 1053, a flow of data comprising 50% of the first traffic flow, and 50% of the second traffic flow under certain circumstances.
The destination of packets has an internal queuing mechanism. For example, there are two queues from the network interface 1054 to the CPU 1053: the first queue for the first traffic flow and the second queue for the second traffic flow. If the network interface 1054 then detects that the first queue is already filled up, the CPU 1053 cannot serve the first queue. The network interface 1054 then drops the next packet to the first queue.
In this case, the link between the switch 1051 and the network interface 1054 is used inefficiently because the switch does not know the status of the network interface queue. The switch 1051 continues to send 50% of the first traffic flow, although the network interface 1054 will just drop the packets anyway. At the same time, although the CPU 1053 can serve the second queue, the switch 1051 only sends 50% of the second traffic flow.
However, if the switch 1051 had known about the congestion, it could have sent more packets from the CPU 1052, and fewer packets from the CPU 1063. In addition, if the switch 1051 had informed the switch 1061 about the congestion, the switch 1061 could have employed a packet discard mechanism to remove the packets from the CPU 1063 at the outset, thus reducing the load on the entire switching system, and allowing traffic flow from the CPU 1052 to pass through with higher bandwidth.
However, conventional network interfaces do not communicate with their attached switches about queue status of the network interfaces. In addition, conventional Ethernet switches that are connected via standard Ethernet ports to each other do not communicate congestion information over the Ethernet link. The only such known mechanism is the disadvantageous 802.3x flow control mechanism. The prior solution has been to use a separate link 110 to communicate congestion information. However, that information had no relation to priority queues.
Therefore, it would be desirable to provide a method and apparatus for communicating the queue status of a network interface to its attached switch, and for communicating the queue status between switches.