DWDM, which stands for Dense Wavelength Division Multiplexing, by merging onto a single optical fiber many wavelengths, is making available long-haul fiber-optic data communications links of huge aggregate capacity. Each wavelength is an independent communications channel which typically operates at OC48c i.e. 2.5 Gigabits per second or 109 bits per Second (Gbps), OC192c (10 Gbps) and in some systems at OC768c (40 Gbps). These formats and rates are part of a family available for use in optical interfaces, generally referred to as SONET, which is a standard defined by the American National Standards Institute (ANSI) of which there exists an European counterpart, mostly compatible, known as SDH (Synchronous Digital Hierarchy). Thus, at each node of a network, the data packets or cells carried on each DWDM channel must be switched, or routed, by packet-switches that process and then switch packets between different channels so as to forward them towards their final destination. If, ideally, it would be desirable to keep the processing of packets in the optical domain, without conversion to electronic form, this is still not really feasible today mainly because all packet-switches need buffering that is not yet available in an optical form. So packet-switches will continue to use electronic switching technology and buffer memories for some time to come.
However, because of the data rates as quoted above for individual DWDM channels (up to 40 Gbps) and the possibility of merging tenths, if not hundredths, of such channels onto a single fiber the throughput to handle at each network node can become enormous i.e., in a multi-Tera or 1012 bits per second range (Tbps) making buffering and switching, in the electronic domain, an extremely challenging task. If constant significant progress has been sustained, for decades, in the integration of always more logic gates and memory bits on a single ASIC (Application Specific Integrated Circuit), allowing implementation of the complex functions required to handle the data packets flowing into a node according to QoS (Quality of Service) rules unfortunately, the progress in speed and performance of the logic devices over time is comparatively slow, and now gated by the power one can afford to dissipate in a module to achieve it. Especially, the time to perform a random access into an affordable memory e.g., an imbedded RAM (Random Access Memory) in a standard CMOS (Complementary MOS) ASIC, is decreasing only slowly with time while switch ports need to interface channels having their speed quadrupling at each new generation i.e. from OC48c to OC192c and to OC768c respectively from 2.5 to 10 and 40 Gbps. For example, if a memory is 512-bit wide allowing storing or fetching, in a single write or read operation, a typical fixed-size 64-byte (8-bit byte) packet of the kind handled by a switch, this must be achieved in less than 10 Nano or 10−9 second (Ns) for a 40 Gbps channel and in practice in a few Ns only in order to take care of the necessary speed overhead needed to sustain the specified nominal channel performance while at least one store and one fetch i.e., two operations, are always necessary per packet movement. This represents, nowadays, the upper limit at which memories and CMOS technology can be cycled making the design of multi-Tbps-class switch extremely difficult with a cost-performance state-of-the-art technology such as CMOS, since it can only be operated at a speed comparable to the data rate of the channel they have to process.
Hence, to design and implement a high capacity packet-switch (i.e.: having a multi-Tbps aggregate throughput) from/to OC768c (40 Gbps), a practical architecture, often considered to overcome the above mentioned technology limitation, is a parallel packet switch (PPS) architecture. As shown on FIG. 1, it is comprised of multiple identical lower-speed packet-switches (100) operating independently and in parallel. Generally speaking, in each ingress adapter such as (110), an incoming flow of packets (120) is spread packet-by-packet by a load balancer (130) across the slower packet-switches, then recombined by a multiplexor (140) in the egress adapter e.g., (150). As seen by an arriving packet, a PPS is a single-stage packet-switch that needs to have only a fraction of the performance necessary to sustain a PPS port data rate (125). If four planes (100) are used, as shown in FIG. 1, their input ports (102) and output ports (104) need only to have one fourth of the performance that would otherwise be required to handle a full port data rate. More specifically, four independent switches, designed with OC192c ports, can be associated to offer OC768c port speed, provided that ingress and egress port-adapters (110, 150) are able to load balance and recombine the packets. This approach is well known from the art and sometimes referred to as ‘Inverse Multiplexing’ or ‘load balancing’. Among many publications on the subject one may e.g., refer to a paper published in Proc. ICC′92, 311.1.1-311.1.5, 1992, by T. ARAMAKI et al., untitled ‘Parallel “ATOM” Switch Architecture for High-Speed ATM Networks’ which discusses the kind of architecture considered here.
The above scheme is also attractive because of its inherent capability to support redundancy. By placing more planes than what is strictly necessary it is possible to hot replace a defective plane without having to stop traffic. When a plane is detected as being or becoming defective ingress adapter load balancers can be instructed to skip the defective plane. When all the traffic from the defective plane has been drained out it can be removed and replaced by a new one and load balancers set back to their previous mode of operation.
Thus, if PPS is really attractive to support multi-Gbps channel speeds and more particularly OC768c switch ports it remains that this approach introduces the problem of packet re-sequencing in the egress adapter. Packets from an input port (110) may possibly arrive out of sequence in a target egress adapter (150) because the various switching paths, comprised of four planes (100) in the example of FIG. 1, do not have the same transfer delay since they run independently thus, can have different buffering delays. A discussion and proposed solutions to this problem can be found, for example, in a paper by Y. C. JUNG et al., ‘Analysis of out-of-sequence problem and preventive schemes in parallel switch architecture for high-speed ATM network’, published in IEE Proc.-Commun., Vol. 141, No. 1, Feb. 1994.
However, this paper does not consider the practical case where the switching planes have also to handle packets on a priority basis so as to support a Class of Service (CoS) mode of operation, a mandatory feature in all recent switches which are assumed to be capable of handling simultaneously all sorts of traffic at nodes of a single ubiquitous network handling carrier-class voice traffic as well as video distribution or just straight data file transfer. Hence, packets are processed differently by the switching planes depending on the priority tags they carry. This no longer complies with the simple FCFS (First-Come-First-Served) rule assumed by the above referenced paper and forces egress adapters to readout packets as soon as they are ready to be delivered by the switching planes after which they can be re-sequenced on a per priority basis.
Also, the above paper implicitly assumes the use of a true Time Stamp (TS) which means in practice that all port-adapters are synchronized so as packets from different sources are stamped from a common time reference which is a difficult and expensive requirement to meet.
Another difficulty with a PPS architecture stems from the fact that networks must not only support unicast traffic (one source to one destination) but also multicast traffic that is, traffic in which a source may have to send a same flow of packets to more than one destination. Video distribution and network management traffic are of this latter case (e.g., the IP suite of protocols assumes that some control packets must be broadcast). In practice, this prevents a simple numbering of packets in each source, on a per destination and per priority basis, from being used which would allow the implementation of a straightforward and inexpensive re-sequencing in each egress adapter on a per flow basis. For example, with a 64-port switch there are only 64 unicast flows (times the number of priority) for each source since there are only 64 possible destinations, a number that is easily manageable. However, there are possibly 264-65 (times the number of priority) combinations of multicast possible flows from a same source. Each flow would have to be numbered separately to keep coherency in the packet numbers received by the egress adapters (n, n+1, n+2, etc.). However, 264 is an impossible number to deal with as far as the implementation of resources is concerned.
Therefore, the numbering of packets sent from a source can only be envisaged if it ignores the destination of the packets (so as unicast and multicast traffic can be processed identically in the egress adapters). In other words, packets must be marked at source either with a true TS (Time Stamp) or, if not strictly with a TS, with a common counter (or a counter per priority), in each ingress adapter and counter(s) incremented with each departing packet irrespective of its destination(s). The second solution is obviously preferred on a cost viewpoint since it does not assume any form of synchronization between the ingress port-adapters of a switch. As stated in JUNG's paper quoted above (in section 4.1), the packet re-sequencing function is complex to implement as a result of using time stamps since it assumes that egress adapters are able to restore sequences of packets marked with numbers in ascending order i.e., n, nx, ny, etc. where the only assumption that holds is that n<nx<ny since each source, using a TS or a common counter is free to interleave the sending of packets to any combination of destinations.
Thus, there is a need for a resequencing arrangement to overcome the difficulties mentioned here above in order to make feasible a PPS architecture in which variable delays can be experienced in the individual switching planes while supporting priority classes of unicast and multicast traffic in view of the implementation of a multi-Tbps switch.
The present invention offers such complete approach and solution.