The present invention relates to network switches for packet-based communication systems such as Ethernet networks and to an improved method of operating such a network switch. The term xe2x80x98switchxe2x80x99 is intended to refer broadly to a device which receives addressed data packets and which can internally switch those packets in response to that address data or modified forms of such data. The invention is intended to be applicable to a variety of different switch architectures, as indicated hereinafter.
(a) Traffic Queues
It is well known to form traffic queues of data packets in network switches. Their formation is necessary to provide temporal buffering of a packet between the time it is received at a network switch and the time at which it can be transmitted from the switch. In most forms of network switch, the switch has a multiplicity of ports, and data packets received at the ports may after appropriate processing including look-ups in relation to destination and source addresses in the packets, be directed to a port or ports in accordance with that address data. Switches employing both media access control addresses (such as in bridges) or network addresses (such as in routers) are of course well known in the art. In such switches it is customary to provide temporal buffering both when the packets are received, in what are known as xe2x80x98receive queuesxe2x80x99, and when they are assigned to transmit ports, in what are known as xe2x80x98transmit queuesxe2x80x99. In general, the transmission of packets from a transmit queue may depend on a variety of considerations, including possible congestion in a device to which the respective port is connected.
It is known to form queues of data packets in a variety of ways, including comparatively simple FIFOs established in hardware. More usually in modern switches queues may be formed in random access memory employing read and write pointers under the control of a memory controller. If static random access memory is employed, a particular traffic queue may be allotted a defined memory space and packets may be read in to that memory space under the control of a read pointer which progresses from one location to another until it reaches the xe2x80x98endxe2x80x99 of the allotted memory space whereupon it recycles to the beginning of the memory space (on the assumption that the space is not fully occupied). A read pointer progresses through the memory space in a similar manner. In such systems the fullness of a memory space or thresholds representing some fraction of fullness need to be expressed in terms of the effective distance in terms of memory locations between the read and write pointers.
Another system is a dynamic memory comprising a plurality of identifiable buffers which can be allotted to a specific traffic queue under the control of a Free Pool Controller and Transmit (Tx) Pointer Manager, termed for convenience herein xe2x80x98memory controllerxe2x80x99. In such a system, any particular traffic queue may have initially some small number, such as two, buffers allotted to it. If a queue requires more traffic space, then the memory controller can allot additional buffers to the queue. It is, as indicated for the previous example, possible to limit the available memory space by a limitation on the number of buffers employed for any particular queue, though it is known, and preferable in a variety of circumstances, to allow some traffic queues more space than others by imposing a different limit on the maximum number of buffers which can be used for that queue. In buffer systems, data may written into the buffers using a write pointer and read out from the relevant buffers using a read pointer. In general, the size of each buffer is substantially more than that of a single packet. Packets are normally stored in such buffers in the form of a status word (which would normally be read first), including some control data and also an indication of the size of the packet, followed by address data and message data. An interface which reads a packet from such a buffer store will, in a reading cycle, commence reading the status word and proceed to read the packet until the next status word is reached.
It is also possible, and preferred in the specific embodiment of this invention, to form a traffic queue indirectly, that is to say not by the packets that are in the queue but by respective pointers each of which points to a location containing the respective packet in the relevant memory space. In a scheme such as this, the receive and transmit queues are constituted by lists of pointers in respective memory space. The length of each queue may simply be determined by the number of entries (i.e. pointers) in the respective queue. When a pointer reaches the xe2x80x98topxe2x80x99 or xe2x80x98frontxe2x80x99 of the queue, then, assuming the conditions for forwarding the respective packet have been met the pointer is employed by the switching engine to retrieve the respective packet from the relevant memory location.
(b) Transfer of Packets Across a Switch
There exists a variety of mechanisms and architectures for determining how a packet should be forwarded across a switch and in particular from a xe2x80x98receivexe2x80x99 queue to a xe2x80x98transmit queuexe2x80x99. Basically, they all have in common a look-up process by means of which the destination of a packet, for example defined by a destination media access control address, is determined with the aid of a forwarding database that yields on the discovery of a match between the destination of the packet and an entry in the database forwarding data which determines the port or (in the case of a multicast packet) a multiplicity of ports from which the packet has to be forwarded. The compilation and organisation of forwarding databases and the use of ancillary features such as link tables, port masks and such like is too well known to warrant further description here.
(c) Discard of Packets within a Switch
It is a frequently occurring phenomenon in data communication networks that owing to variations in loading or data transmission rates and other circumstances the rate at which packets (or their pointers) are written to a transmit queue is greater than the rate at which packets (or their pointers) are removed from the queue by virtue of the forwarding of the packets from the respective port. For example, a device at the other end of a link to which the port is connected may itself be congested and, for example, may exert xe2x80x98flow controlxe2x80x99, a term conventionally used to denote the sending of a control frame that prescribes a pause in the forwarding of packets from that port over the link for some time specified in the control frame. In any event, in any physical switch the memory space which can be allotted to a transmit queue is necessarily limited and there is always the possibility that the transmit queue becomes full. xe2x80x98Fullnessxe2x80x99 is normally indicated when the length of the queue exceeds some predetermined value, called herein xe2x80x98high watermarkxe2x80x99. The high watermark may correspond to the maximum physical capacity allotted to the transmit queue though that is not essential, it is within the scope of the present invention for the high watermark to define some predetermined length which is less than the maximum physical capacity allotted to the queue.
It is customary when a transmit queue is xe2x80x98fullxe2x80x99, however in practice this may be defined, for a look-up arbiter forming part of the forwarding engine not to forward a packet at the head of a receive queue to the transmit queue for which that packet is destined, instead the look-up arbiter causes discard of the packet. One reason for doing this, apart from the fact that the transmit queue can no longer accept any fresh packet, is to avoid xe2x80x98head of line blockingxe2x80x99. It will be understood that if a packet which is at the head of a receive queue and intended for a particular transmit queue cannot be forwarded to that transmit queue, then packets subsequent to that packet at the head of the same receive queue can be blocked even though they may be intended for ports other than the port of which the traffic queue is full.
(d) Capture effect
Whether xe2x80x98discard on fullxe2x80x99 is implemented in a switch or not, a multi-port switch is susceptible to what is known as a xe2x80x98capture effectxe2x80x99 arising from the fact that some ports are more likely to direct packets to a particular transmit queue than other ports. This is particularly apparent when some ports of a switch are coupled to low speed links whereas other ports are coupled to higher speed links.
Once a transmit queue is full it takes, in general, the same length of time to forward a packet of a given size as it takes to receive a packet of the same size. Thus in a switch where all the ports are asynchronous the last port to provide a packet to a transmit queue and thus fill it may be requesting the forwarding of a new packet to that port when the transmit port has transmitted its packet. Thus the most likely packet to be placed on the transmit queue is a packet pending from the port that previously provided a packet to the transmit port.
If the xe2x80x98discard on fullxe2x80x99 mode is in operation, all other ports that have, in their receive queues, packets for a transmit port of which the queue is full will discard the xe2x80x98headxe2x80x99 packet because the transmit queue would still be full when transfer of the packet from their respective receive queue to the (full) transmit queue should occur. As soon as the transmit queue has taken one packet then all subsequent requests would be ignored because the transmit queue is now full again.
It is possible to employ xe2x80x98round robinxe2x80x99 systems wherein an interface which services transmit queues, that is to say organises the transfer across a switch of packets from receive queues to transmit queues is so arranged that a transmit queue can except packets only in turn from the various receive queues in a cyclic or xe2x80x98round robinxe2x80x99 sequence. However, such a system, particularly for a large number of ports, tends to be both complex and inflexible.
The present invention is based on the provision of hysteresis in the production of a signal which denotes that a transmit queue is full. More particularly, a transmit queue xe2x80x98fullxe2x80x99 flag is set when the queue is full (i.e., it is greater than the size denoted by the high watermark) but is not xe2x80x98releasedxe2x80x99 until the transmit queue can accept a multiplicity of packets, this multiplicity being preferably at least equal to at least one packet for each of the ports that can provide packets for the respective transmit queue. Thus when the xe2x80x98fullxe2x80x99 flag is released every port with a pending request for transfer of a packet from its respective receive queue to the previously full transmit queue can now be serviced.
It is therefore convenient to define, in accordance with the invention, a low watermark that corresponds to a length of transmit queue shorter than the length associated with the high watermark by the aforementioned multiplicity of packets. The low watermark will be of significance until the xe2x80x98fullxe2x80x99 flag for the queue is asserted, thereafter the full flag will only be released when the transmit queue has diminished to below the low watermark.