This invention relates to switches for use in packet-based data communication systems.
The invention is intended to be applicable to switches in a general sense for data packets. Although more specific terms, such as xe2x80x98bridgexe2x80x99 or xe2x80x98routerxe2x80x99 or xe2x80x98gatewayxe2x80x99 or other terms are used to denote particular devices, the invention is intended to be applicable in general to a device which can receive data at any of a multiplicity of ports and direct data packets from at least one selected port in accordance with address data, such as media access control data or network address data within the packets. The invention is not intended to be applicable to hubs wherein data received at any port is transmitted without any selection.
Although a distinction is made in the following between xe2x80x98inputxe2x80x99 ports and xe2x80x98outputxe2x80x99 or transmitting ports, it will be understood that the relevant qualifications refer to the function performed at the time by a given port and the invention is applicable to devices wherein ports may serve to both receive data from and transmit data to a respective network segment, or in other words the ports are bidirectional and whether a port receives or transmits depends on the control exerted by the processing and control functions within the switch.
In a switch employed in a packet-based or packet-switched communication system, incoming packets are processed to determine the port for which that packet is destined. The packets are, pending transmission from the respective xe2x80x98outputxe2x80x99 port, temporarily stored, usually in a dynamic random access memory organised into buffers which are under the control of software xe2x80x98pointersxe2x80x99 which define for each xe2x80x98outputxe2x80x99 port a queue of data packets waiting to be transmitted from that port. That is not the only manner of organisation of output queues; for example a plurality of FIFO stores may be used for basically the same purpose. In any event, the switch has means for storing data packets prior to transmission from the output ports and for identifying respective queues of data packets for the output ports.
When a packet is transmitted from any particular port, it is removed from the respective queue, so that there is an additional space for a new packet in the queue. In any practical device there is necessarily a limit to the amount of storage space which can be allotted to packets awaiting transmission from a device.
Frequently, the packets which are to be transmitted from a given port have been received by a multiplicity of ports on the device. Where, as is commonplace, the rate at which data packets are received on a port is of the same order as the rate at which data can be transmitted from a port, it is quite possible, when data to be transmitted from a port comes from a multiplicity of ports, for the transmit queue to become excessively long such that the allotted storage space is completely used and no more entries can be made. If this occurs, then a processor for determining which ports are to transmit received packets becomes jammed and is unable to forward packets to any of the other ports. This known phenomenon is termed xe2x80x98head of line blockingxe2x80x99.
The present invention is primarily intended to reduce the incidence of xe2x80x98head of line blockingxe2x80x99.
It is known, for example from WO94/14266, to determine, before forwarding across a switch a data packet from an input buffer to an output buffer, to determine whether the output buffer is xe2x80x98fullxe2x80x99 and, if so, to prevent the forwarding of the packet across the switch. It is also known for example from WO99/00949, to allot different priorities to different input queues and to reduce the priority of a particular input queue if the contribution of that queue to a particular output queue is excessive. Both schemes may assist in avoiding congestion at an output buffer, though generally at the cost of requiring potentially large input buffer space. Moreover, though such proposals may temporarily relieve congestion, they are not adapted to avoid longer term congestion, which often arises because a particular source is sending traffic to a switch at an average rate which is greater than the switch can forward packets to the destination or multiplicity of destinations to which the packets must be sent.
One aspect of the invention concerns the reduction of the flow of data packets to a port which is identified as a principal contributor to an output queue. Input traffic cannot be stopped merely by preventing the forwarding of packets across the switch or the allocation of different priorities to input queues, as discussed in the aforementioned references.
It is known for Ethernet systems and particularly systems conforming to IEEE Standard 802.3 (1998 Edition) to employ MAC control frames. As described in Annexe 31B of that Standard, flow control frames may be used to inhibit transmission of data frames for a specified period of time. Such a frame includes a special multicast address, a xe2x80x98pausexe2x80x99 operation code and a request operand which indicates the length of time for which data frame transmission should be inhibited. A link partner, coupled to the switch which generates such a control frame, will on receipt of such a control frame cease sending data frames or packets to the switch for the period of time specified in the control frame. The present invention utilises such control frames or their equivalent, that is to say some frame having the nature of a packet and which will cause the recipient to cease temporarily the sending of packets.
In order to initiate the generation of a flow control frame, the present invention includes some means of relating the generation of control frames to the input traffic flow at a port.
This may be done by incrementing a counter with a value related to the size of incoming data packets, such as by counting octets therein, and decrementing the counter at some controllable rate. Such a counter is known as a xe2x80x98leaky bucketxe2x80x99 and is described for example in our GB patent application number 9807264.8 filed Apr. 3, 1998. By comparing the net content of the counter with a threshold, which may be adjustable, one can provide the mechanism of a remote throttle, which can be used to restrict, by means of the dispatch of xe2x80x98flow controlxe2x80x99 frames, the traffic of packets being sent to a particular port. In what follows, a device which measures the input traffic to a port against a threshold to generate a signal which can be used to restrict traffic to that port is termed a throttle and the relevant process is termed throttling. It is emphasized that such a throttle is not preventing the flow of packets within a switch, it is signalling to a provider of packets to that switch.
It is customary for a switch to include some means for determining, by an examination of all incoming packets or at least regular samples thereof, the port-to-port traffic flow within a switch. As indicated hereinafter, this may be achieved by means of host matrix statistics which give the number of packets for each of the conversations, defined by destination and source MAC (media access control) addresses, and by means of a customary look-up table which relates media access control addresses with the xe2x80x98inputxe2x80x99 and xe2x80x98outputxe2x80x99 port for each packet. The port-to-port traffic flow data may alternatively be obtained using other statistics, such as RMON (remote monitoring) statistics.
The present invention is partly based on employing data, which is either available or can be readily made available, defining port-to-port traffic flow within the switch and to respond to a condition indicating that a transmit queue for a particular port exceeds a threshold to throttle traffic which comes by way of at least one of the ports to such a queue.
One feature of the invention is that by using several thresholds, or by making repeated determinations of the port by port contributions to an output queue, one or more input traffic flows can be progressively reduced.
Another aspect of the invention is the process by which traffic arriving at a receiving port is selected to throttling in accordance with the statistical data on the port-to-port traffic flow and an indication that a transmit queue exceeds a threshold or set limit. Control for the process may readily be obtained by setting a threshold either in software or hardware, depending on the nature of the transmit buffer. How the control is exerted is to some extent a matter of choice. For example, it may be desirable to ensure that the port which is selected for flow control in response to the detection of an over-long, transmit queue is the port which is identified as the greatest provider of traffic to the transmit queue. An examination of the port-to-port traffic statistics enables an identification of the most prolific input port in relation to the output queue and the data traffic received at that port can be reduced, preferably progressively. The criteria for controlling the traffic flows to the ports may be such that if the port which has been identified as the most prolific has had its traffic reduced, yet the transmit queue is still excessive, then a second most prolific provider of traffic may have its bandwidth reduced and so on, the process being carried on until an equilibrium is reached where the transmit queue for the output port does not grow.
The main advantage of the present invention is that the reduction of traffic to an input port or ports may be made to occur well before a transmit queue is completely full. This assists in avoiding the blocking of a switch and normally will enable ports that are not communicating to the potentially congested output port to continue their respective port-to-port traffic unaffected.
The invention may be extended throughout a network. If for example an input port is connected to an end node, then the device at that node can be forced to send less traffic. If the input port of a first switch is connected to another switch, then the relevant transmit queue of that other switch will increase to the point where a similar mechanism to that described in relation to the first switch can be used to reduce the traffic arriving at a port or ports in the other switch. The invention contemplates thereby a dynamic bandwidth control which enables a whole network to balance itself to traffic flow rates without any direct management from a network administrator.
A possible feature of the invention comprises excluding a specified port or ports from the process of determining which port or ports should be subject to traffic reduction in response to the detection of an excessive transmit queue. This feature enables a specified port or ports to be excluded from any traffic reduction and may be used to allow specific ports to function as nodes that must unconditionally be guaranteed as much traffic as they require The feature enables important or high priority traffic to continue unimpeded while less important or low priority traffic is restricted. For example, the control process can examine types of packets so that ports which contain large amounts of xe2x80x98web-browsingxe2x80x99 packets may be penalised more than a port carrying mostly network management traffic.