The high speed routers and data switching systems share a fundamental architectural structure in the way they are built, perform their operations and handle the data that is going through the system. FIG. 1 describes a generic architecture for such a high speed data switching system. While different systems may vary in implementation, most of the fundamental elements that are described below can be found in all systems, albeit each with its own unique implementation flavor.
The following is a description of the main elements of a generic system (referring to FIG. 1). High speed routers and data communication switching systems are composed of a set of N line cards 10. Each of the line cards interfaces to the data network (not shown) for receiving and sending information to and from the network. In a typical packet switching network, the basic information element is a packet and, hence, each line card receives packets from the network and sends packets to the network. All the line cards in a given system are connected to each other via an internal interconnect 12. The common way to implement such an interconnect in a high speed system which handles a lot of bandwidth is a switch fabric that allows sending of information from a set of line cards 10, acting as source line cards, to a set of line cards 10′, acting as destination line cards, in the most efficient way.
As a packet arrives to the line card 10 from the network, it is absorbed by an input buffer 20 and then handed to a set of elements 22 that perform various kinds of processing and handling of the packet. In a typical router, this includes elements for processing of the layer 2 headers (e.g., processing of the Ethernet header, in case the interface is Ethernet), and a network processor in the card, e.g., of EZchip Semiconductor, Ltd. Of Yokneam, Israel, that performs the destination resolution (based on any of IP address look up, MPLS label look up, and ACL based forwarding using any other field in the packet header, as well any combination of fields), which leads to a decision to which line card the packet should be sent. In addition, any ingress features that were configured are applied at this stage. Some examples of such features are filtering, policing, statistics updates, header fields updates, such as TOS/EXP, TTL, etc., or searches in other fields of the packet, all as per the specific configuration. After all the required operations are completed, the packet, whose destination egress path is now known, is handed to the line card switch fabric interface 24 and is held in a buffer until its turn comes to be sent over the switch fabric (interconnect) 12 to the destination line card.
There are many different types of switch fabric architectures. All implement an efficient interconnect between N line cards, where each line card may need to send information to any of the other N line cards either in unicast or multicast, and the switch fabric algorithm optimizes the usage of the interconnect. This implies, of course, that the switch fabric is a congestion point, since if, for example, at a given point in time, all the line cards need to send packets to the same subset of line cards, some of the packets will have to wait for their turn, since the switch fabric interconnected is shared. The role of the switch fabric is to look at all the offered load across all of the line cards and optimize the sending of traffic among all the line cards at any point. Clearly, it may be the case that some packets will need to wait till their turn comes before being sent to the destination line card. Hence, buffering at the ingress is required and is common across all the systems that implement switch fabrics. This buffer is usually arranged into multiple queues, each getting different handling, so that differentiation between the different types of traffic can be made. For example, allowing traffic that is more sensitive to a delay to be sent first.
As the packet traverses the switch fabric and arrives to the destination line card 10′ through which it will egress the system, it is placed in a buffer 26 that is receiving the information from the switch fabric 12. From there, it is handed to a set of processing elements 28 that handle the outgoing traffic. These may include a network processor that may apply any feature that was configured to be applied at this egress path for this particular type of packet. Examples of such features are policing the outgoing traffic rate, applying filtering for various security measures, updating various statistics, and others. Next, the packet is processed for a layer 2 header and then handed over to the egress buffer 29 before it is sent out. The egress buffer 29 is the place where differentiation among different types of data and destinations can be made, so that the router can provide the service level that is required for each type of traffic. For instance, if there are several customers connected to a line card and some have bought and paid for more bandwidth than others, we need to prioritize their traffic ahead of low paying customers' traffic. Hence, in the egress buffer there usually is a queuing system that can queue packets, giving each type of data different handling by way of priority, shaping, amount of BW, etc. This egress queuing is in addition to the ingress queuing which is required due to the switch fabric congestion. This architecture is called combined input-output queuing and is typical of most high speed switch fabric based systems.
Combined input-output queuing architecture, while efficient, is limited in its ability to scale to a very high bandwidth of its various line cards. This results from the need for a high speed up. When packets from multiple inputs are destined toward a certain output, ideally one would want to send all the arriving information to that output as soon as it is ready to be sent. At the output, one can then observe all of the offered load at any given point in time. This allows prioritizing the traffic based on the actual offered load and delivering Quality of Service accurately in which some of the packets streams may be discriminated relative to other higher priority streams. However, in order to achieve that, one needs to be able to receive information simultaneously at the output from all the inputs in order to cater for the extreme case in which all the inputs want to send packets to the same output during the same window of time. This, in turn, requires a very high bandwidth into the receiving element at the output side. This amount of bandwidth is typically measured as a multiple of the line card output speed and is called speed up. Hence, if we have N line cards, all sending to a certain line card, all are of the same speed. Then, if that output card can absorb the information from all the inputs simultaneously, it is said that it has a speed up of N.
When the line card bandwidth is very high, achieving a speed up of N is not practical, since it is not practical to receive more than a certain amount of information into a traffic manager ASIC or into a memory, which are the typical receiving devices in the output side. Hence, a compromise is deployed in which a slower speed up, commonly of 2 or 3, is implemented. As a result, in certain temporary cases in which more than 2 or 3 inputs must send information to the same output, some of the information will have to be buffered in the inputs, as no more than 2 or 3 (depending on the implemented speed up) streams can be sent simultaneously. Hence, a combined input-output queuing architecture results. While this approach provides a reasonably efficient solution, it poses a challenge when the bandwidth of the line card further increases. In this case, achieving even a speed up of 2 may become a challenge for the same reasons mentioned above—technical difficulty in receiving very high bandwidth into an ASIC or a memory device. As a result, the amount of packets that are accumulated in the input queues will increase, which, in turn, will increase the overall delay of packets in the system. If, among those packets, there are streams that require low delay due to the nature of the traffic they carry, those streams may not receive the desired handling. This accumulation in the input queues causes further inaccuracy in delivering the required Quality of Service for the overall traffic streams, since the output can now observe only a smaller fraction of the traffic, as most of it is queued in the inputs. As a result, the decision which packets stream to send and which to delay is not optimal. Furthermore, since each input operates independently and has no information on the available offered load at other inputs, none can make the most optimal decision across all of the traffic. Lastly, both the switch fabric and the input buffers can overflow, which leads to packets drop, which is not optimal. The result is that, instead of observing the offered load and making the most optimal decision, the decision of which traffic to send is divided across multiple nodes, each operating independently, and is based on partial information. The more the bandwidth of each line card increases, the more this challenge manifests itself. Hence, an input-output architecture presents a bandwidth scalability challenge.
Additionally, even for bandwidth in which a good enough speed up can be achieved, the input-output queuing architecture presents a configuration challenge. This is the case since, in order to achieve a certain desired behavior of discrimination among the outgoing packet streams of a certain output, one must configure the priorities and behavior of streams in the output, as well as in all the inputs, since queues can build up in any input toward a certain output. This makes the configuration more complex.
Accordingly, there is a long felt need for a packet switching system that could move all the input streams to the output without queuing at the input, irrespective of the line card speed, hence achieving a speed up of N. This will allow performing configuration at the output line card only, leading to a simpler configuration, more accurate Quality of Service behavior and a smaller system, as it will also allow removal of the associated input buffers. It would also be desirable if one could continue and scale the bandwidth speed of a line card upward without degrading traffic behavior.