The present invention relates to network switches for packet-based communication systems such as Ethernet networks and to an improved method of operating such a network switch. The term xe2x80x98switchxe2x80x99 is intended to refer broadly to a device which receives data packets containing address data and which can internally switch those packets in response to that address data or modified forms of such data. The invention is intended to be applicable to a variety of different switch architectures, as indicated hereinafter.
(a) Traffic Queues
It is well known to form traffic queues of data packets in network switches. Their formation is necessary to provide temporal buffering of a packet between the time it is received at a network switch and the time at which it can be transmitted from the switch. In most forms of network switch, the switch has a multiplicity of ports, and data packets received at the ports may, after appropriate processing including look-ups in relation to destination and source addresses in the packets, be directed to a port or ports in accordance with that address data. Switches employing both media access control addresses (such as in bridges) or network addresses (such as in routers) are of course well known in the art. In such switches it is customary to provide temporal buffering both when the packets are received, in what are known as xe2x80x98receive queuesxe2x80x99 and when they are assigned to transmit ports, in what are known as xe2x80x98transmit queuesxe2x80x99. In general, the transmission of packets from a transmit queue may depend on a variety of considerations, including possible congestion in a device to which the respective port is connected.
It is known to form queues of data packets in a variety of ways, including comparatively simple FIFOs established in hardware More usually in modern switches queues may be formed in random access memory employing read and write pointers under the control of a memory controller. If static random access memory is employed, a particular traffic queue may be allotted a defined memory space and packets may be read in to that memory space under the control of a read pointer which progresses from one location to another until it reaches the xe2x80x98endxe2x80x99 of the allotted memory space whereupon it recycles to the beginning of the memory space (on the assumption that the space is not fully occupied). A read pointer progresses through the memory space in a similar manner. In such systems the fullness of a memory space or thresholds representing some fraction of fullness need to be expressed in terms of the effective distance in terms of memory locations between the read and write pointers.
Another system is a dynamic memory comprising a plurality of identifiable buffers which can be allotted to a specific traffic queue under the control of a Free Pool Controller and Transmit (Tx) Pointer Manager, termed for convenience herein xe2x80x98memory controllerxe2x80x99. In such a system, any particular traffic queue may have initially some small number, such as two, of buffers allotted to it. If a queue requires more traffic space, then the memory controller can allot additional buffers to the queue. It is, as indicated for the previous example, possible to limit the available memory space by a limitation on the number of buffers employed for any particular queue, though it is known, and preferable in a variety of circumstances, to allow some traffic queues more space than others by imposing a different limit on the maximum number of buffers which can be used for that queue. In buffer systems, data may written into the buffers using a write pointer and read out from the relevant buffers using a read pointer. In general, the size of each buffer is substantially more than that of a single packet. Packets are normally stored in such buffers in the form of a status word (which would normally be read first), including some control data and also an indication of the size of the packet, followed by address data and message data. An interface which reads a packet from such a buffer store will, in a reading cycle. commence reading the status word and proceed to read the packet until the next status word is reached.
It should be understood that a traffic queue both in general and in relation to the present invention may be constituted indirectly, that is to say not by the packets that are in the queue but by respective pointers each of which points to a location containing the respective packet in the relevant memory space. In a scheme such as this, the receive and transmit queues are constituted by lists of pointers in respective memory space. The length of each queue may simply be determined by the number of pointers in the respective queue. When a pointer reaches the xe2x80x98topxe2x80x99 or xe2x80x98headxe2x80x99 of the queue, then, assuming the conditions for forwarding the respective packet have been met, the pointer is employed by the switching engine to retrieve the respective packet from the relevant memory location. In the present invention it is broadly of no consequence whether the traffic queues are constituted directly by the packets or by queues of pointers.
(b) Address Look-Ups
It is customary in most forms of network switch, in the broad sense used herein, to provide a forwarding table or database which contains entries relating address data in a packet to forwarding data enabling a switching engine to determine, usually by means of a port mask, the port or ports from which a packet should be forwarded. Forwarding databases may be established for media access control addresses (otherwise known as layer 2 addresses) or network addresses (layer 3 addresses) or both. In the specific example described hereinafter it will be assumed that the database employs media access control addresses but this is by way of illustration not limitation.
When a packet is received by a switch, in the particular example selected, it is customary to perform two look-ups. The look-ups may be performed while the packet is in a receive queue associated with the particular port by which the packet has been received. One look-up is in respect of the source address (SA) in the packet. The object of this look-up is to build up entries in the data table relating media access control addresses to the forwarding data (such as a port number). If the source address exists in the forwarding database no action need be required. In some switches it is customary to xe2x80x98agexe2x80x99 entries so that the database is not cluttered by addresses which are no longer in active use. In circumstances such as these even though a source address may exist in the database the entry may be updated.
The other look-up is in respect of the destination address (DA) in the packet. If the address exists in the database, the look-up retrieves the forwarding information (such as the port number) associated with that address so that the switching or forwarding engine can determine the port from which the packet should be forwarded and therefore direct the packet (or establish the relevant pointer) to the transmit queue for that particular port.
Although it is not directly relevant to the present invention, if the destination address look-up fails to find a match with an entry in the forwarding database, it is normally necessary to broadcast the packet to all possible ports. Further, although it is again not directly relevant to the present invention, a given packet may be destined, as in the case of a multicast transmission, for more than one port and therefore a packet in a given receive queue may ultimately produce entries in more than one transmit queue.
In most systems the learning process is performed by software, because the placing of a new address in a look-up table requires manipulation of the table that is difficult to perform in hardware. The difficulty partly arises because various techniques are employed to save memory space or to render destination address look-ups more rapid. For example, hashing of addresses may be employed so as to collapse 48-bit media access control addresses to 16-bit addresses in a pointer table, the pointers in such a table pointing to a linked list of entries in a look-up table. Hashing, is described in, for example, U.S. Pat. No. 5,708,659 and in British Patent Application Publication No. GB-2337659.
In any event, the rate of learning of new addresses in a high performance switch is very slow compared with the number of packets that pass through, or should pass through, the switch. Typically, learning rates tend to be limited to hundreds or thousands of addresses per second while the switch is handling literally millions of packets per second, as a general rule the rate of learning is at least one order of magnitude and typically several orders of magnitude less than the rate of packet throughput for which the switch is designed.
It would be possible to reduce the number of look-ups in the switch by performing a source address look-up for only some of the time, on an arbitrary basis. This would cause a situation wherein new source addresses may be missed because every packet is not being checked against the forwarding database. It is therefore desirable to provide a mechanism which reduces the likelihood of a new source address being missed while at the same time preventing what is known as a capture effect. Such an effect can arise wherein each time a packet with a given source address is seen the state machine which controls the look-up engine is in a xe2x80x98don""t check the source addressxe2x80x99 mode and so that address is never learned.
The present invention is based on a selective reduction in the performance of source address look-ups in a forwarding database. The principle is that the length of the receive (Rx) queue for a given port is used to determine if a source address (SA) look-up should be done. Thus in a lightly loaded system the Rx queues will never fill up and so an SA look-up can be done for every packet, while in a heavily loaded system the Rx queues will fill up and so the SA look-up can be inhibited.
In an optimal system the bandwidth of the ports should be such that there are 50% more ports"" worth of bandwidth than the look-up can handle doing both DA and SA look-ups. Thus if a look-up engine were capable of doing DA and SA look-ups for ten 1-Gigabit ports then by using this technique it could support fifteen 1-Gigabit ports. This enables sufficient bandwidth to enable the Rx ports to recover when oversubscribed, because the look-up with only DA searches would have a bandwidth of twenty ports.
The system preferably takes into account other factors that also cause the Rx queues to fill up. If a switch is configured for lossless mode, then if a Tx (transmit) port fills up no more packets can be placed on the Tx queue and so head of line blocking occurs. This will in turn prevent packets being removed from the Rx queue and so cause the Rx queue to fill up. This situation is quite easy to detect and so in this scenario the SA look-ups would still be carried out even though the Rx queue is filling up.
As the number of packets in the queues is dependent on the speed of the look-up, ignoring head of line blocking, and in any port can switch to DA only look-up at any time, thus increasing the look-up bandwidth, it would be difficult for a capture effect to occur. The chances of capture effect can be reduced further by implementing hysteresis in the Rx queue such that DA only look-ups start when an upper watermark is reached but the DA and SA look-ups only start again when the Rx queue pointer reaches the lower watermark.