The present invention generally relates to a synchronization mechanism for reassertion of connection requests within a data network switch. In particular, the present invention is concerned with a mechanism enabling synchronization of reassertion of requests to connection over a component such as a network switch after initial requests have been rejected due to network congestion.
Data communication networks are normally constructed using a plurality of component elements such as network switches. There is a variety of types and sizes of switching elements. In general, larger networks usually require a number of switching elements linked together to form a multi-stage network. The performance of such data networks can be measured using a large number of parameters which include, amongst other things, bandwidth, latency, addressing and standards compliance.
Multi-stage networks can suffer from network data packet congestion which occurs when a number of different data packets are directed into a single switching element, with each data packet instructed to leave that switching element using the same network link as the other data packets have been instructed to use. The network link is unable to process all of the data packets at the same time, and so congestion occurs. Switching elements typically contain some buffering that allows congested data packets to be stored until the congested exit link from the switch is free to transmit the backlog of data packets. While a data packet is waiting for a congested exit link to become available, the delivery latency of that data packet is increasing.
Network latency is unavoidable when the traffic patterns of data travelling across a network are not predictable or are apparently random. However, the latency of individual data packets or data packet streams can to some extent be controlled. For example, if each switching element is faced with a number of data packet sources each trying to reach a particular egress link from the switch, the switch can make a choice as to which packet to next route to that exit link.
Sometimes network protocol includes priority and each data packet is provided with a priority status whereby higher priority packets should be delivered first. Some switching elements operate a fair delivery system to the data packets arriving from different input links based upon the priority assigned to the packets. Whilst such a switching element can prioritize the packets based on their priority, the switching element cannot take into consideration how any given data packet came to arrive at this switching element. In some cases, if the switching element has network links connected directly to the ingress edge of the network, the switching element may be the first switching element at which the data packet has arrived. However, for some data packets arriving at the same switching element on a link connected into the internal switch fabric of the network, they may have already traversed many other switching elements, through many stages of the network, before arriving at this particular switching element. Furthermore, these data packets may have endured network congestion at each switching element stage of their journey. True network fairness would involve a switching element giving the network traversing data packets priority over the data packets new to the network. One way to improve the total network fairness in such a situation is to give each data packet an age, or date of birth, as it enters the network. Switch elements can then use the packet age in their prioritization of data packets with older packets that have been in the network for longer giving them priority over younger packets that have only just entered the network.
Many network protocols include flow control to prevent network buffers overflowing. Any buffer has a finite size. A buffer associated with an ingress link is used to store packets until an egress link becomes available for connection and transmission of a data packet. If data packets continue to be received for a particular egress link and the egress port associated with that egress link is overcommitted with other traffic from other links, or the egress port it is unable to transmit at the rate the data is being received then eventually the input buffer will become full. At this point either packets that are received on this link have to be discarded or a method to stop them arriving at the buffer must be used.
Flow control is a commonly used method used to prevent buffer overflow. When a data packet is received after transmission across a network, a token is returned in the opposite direction to the received packet to instruct the link partner to stop transmitting. This token can be in the form of a control packet, particular link encoding or even sideband additional wires. Sometimes the token is a signal instructing the transmitter to stop transmitting until told to re-start. In other networks, the token is an instruction to the transmitter to stop transmitting for a period of time and, in some cases of congestion, the instruction to stop transmitting may need to be repeated later. In communication protocols this type of flow control signal is often called a Xoff.
Some network protocols layer multiple data streams over the same link. When this occurs it is possible for the flow control, if it exists, to be independent for each of the data streams. If multiple data flows are using an egress output port, one of the data streams could be heavily congested, with the flow control indicating to the link partner for that data stream to stop transmission, while other data streams are either idle or flowing freely. The other data packet flows should be able to continue to deliver packets to the egress port while the blocked flow is waiting for the flow control to release the channel.
When flow control is used an egress port can be blocked for a significant period of time. When the egress port becomes unblocked it is especially important that the most appropriate packet is selected to be transmitted as an output port that has been blocked is likely to become blocked again as a backlog of data packet to be transmitted are still likely to be present.
There are many different implementations of switching elements. In some cases, large, multi-ported, memory systems are used. These have a number of independent read and write ports and the memory performs both the switching function and the input buffering function. For high performance networks with high bandwidth links the design of the read/write interface to and from such a memory system can be challenging. Other switching element implementations use a crossbar switch with separate buffers for each of the links. It is also possible to have a switching element implementation comprising a combination of multi-ported memories and a crossbar switch where the multi-ported memories will support a small number of links and the crossbar switch is used to link the multiple memories together. All these methods of implementation can be faced with the same problem of arbitrating many separate packet streams being delivered onto a single egress port.
It is usually desirable to maximize the number of links an individual switch is able to support. This has the effect of reducing the total number of switching elements needed in the multi-stage network and in so doing reduces the total cost of the network. It also reduces the total number of links, improves the overall reliability and again reduces the cost of the network. Reducing the total number of links and switching elements within a network can also significantly reduce the power consumption of the network.
However increasing the number of links on a switch increases the complexity of the switch. Such complex switch devices tend to require more logic gates to implement and this can lead to relatively large silicon chips. The problems associated with performing an accurate arbitration decision against many ingress ports for each output port is increased in larger chips. This is especially true if the packet priority and or age values are being correctly honored and the output ports have multiple data flows, channels or data lanes. Furthermore, the input ports will be physically distant, in silicon terms, from the outputs they are trying to connect with.
An example of the issues which can arise will be described with reference to FIG. 1 which shows a schematic diagram of a cross bar switch 10. The crossbar switch 10 is a block of logic used to switch data concurrently from a number of input data streams to a number of output data streams. The cross bar switch 10 is provided with eight inputs 12a-12h, shown as rows, and eight outputs 14a-14h shown as columns however it will be appreciated that much bigger crossbars can be constructed with many more inputs and outputs and wide data paths can be used to increase the bandwidth increasing the size of the crossbar structure. Switch points 16aa-16hh are located at the respective intersections of the rows 12a-h and columns 14a-h. When a switch point, in this case switch point 16fg is closed a connection is made from input 12f to output 14g as shown by the black dot in FIG. 1. Only one switch point is allowed to be connected on a given output column at any one time, however, providing this is the case, multiple outputs can be connected to corresponding inputs at any one time.
In some cases, a single input to a crossbar switch make request to connect to a particular output however in the case of floods, broadcasts and multicasts the single input will request connection to a set of outputs. As well as specifying the output port, a connection request can also specify the output data flow channel at the output port on which the packet should be transmitted on. The input connection request will be provided with a priority and or age value and the output port will use an arbitration method to select the highest priority/oldest packet to connect with.
If flow control is being used on the output's egress port then the output may have to signal to inputs requesting connection that a connection cannot be made because the output flow has been stopped by a full buffer on the egress ports link partners' ingress port. However, this may only be occurring on one of the output's data flows and the other data flows may still be able to accept more data packets.
One recognized way of dealing with this type of problem is to reject the failing connection with a blocked response signal that indicates another attempt at connection should be made later. The input making the connection request would then inhibit the request for a period of time during which that input's connection to the crossbar is free to try to request a connection for another data packet from a different data flow perhaps to a different output. After a period of time the input will retry the original connection request and would either be successful or be made to wait again.
The problem with this approach is that for the input making the request for connection, winning the connection to the congested output's data flow becomes a lottery. There may be many inputs all trying to send data packets to this output and the first input to make a connection request after the output's data flow status has transitioned from stopped to running will be the input that manages to make the connection and send a packet across the data flow with no regard given to packet priority or age in deciding which input the output connected to.
In FIG. 2 there is a schematic representation of an input and output status timeline, associated with connection requests in a network switch such as that shown in FIG. 1, which illustrates this problem. As can be seen, inputs 12a, 12b and 12c are each trying to connect to an output 14d. In this example input 12a has a medium priority request, input 12b has a high priority request and input 12c has a low priority request. The requests are made when new data packets arrive at the respective inputs 12a-c and need to be routed to output 14d. In this example, input 12a's connection request is asserted first. Input 12a makes the request to the output 14d only to find that a blocked signal is being asserted by output 14d. This forces input 12a to de-assert its connection request as there may be other work input 12a could progress such as transmitting other data packets to another output. Input 12b then asserts its high priority request but again is forced to de-assert because the output 14d is still asserting a blocked signal. The output 14d then receives a signal from its link partner (not shown) that it is able to transmit some more data, the timing of this signal arriving is completely unpredictable as it depends on how the packet data is being accepted by the link partner of output 14d. Upon receipt of this signal, output 14d de-asserts the blocked status signal. The low priority request from input 12c then gets lucky and happens to be asserted just after the blocked status is removed from output 14d. Input 12c therefore successfully connects to output 14d despite being the lowest priority requestor. Inputs 12a and 12b subsequently retry their connection requests but now find they cannot connect to output 14d because input 12c is already connected.
As the output 14d did not have all the relevant requests available, at the same time, in order to perform a comparison of priority and/or age of the data packets, these factors could not be taken into account in assigning the next connection. With requests arriving at various times, then being rejected and made to wait for a period of time, the probability of all the requests being re-asserted together is unlikely.
However, ensuring that the highest priority/oldest packet are chosen for transmission across a switch is especially important when output data flows become blocked because these are the exact conditions when old packets are generated within a network. Innetworks where maximum latency is an important parameter, and if priority and/or age are included in the network protocol, then it is critical that blocked resources are managed carefully and the highest priority request is always the next to be selected.
It can therefore be seen that there is a need for a mechanism that can synchronize reassertion of connection requests within a network switch. It would be convenient if such a mechanism could also be able to maintain optimum connection functionality when no reassertion of request is required.