DWDM, which stands for Dense Wavelength Division Multiplexing, by merging onto a single optical fiber many wavelengths, is making available long-haul fiber-optic data communications links of huge aggregate capacity. Each wavelength is an independent communications channel which typically operates at OC48c i.e.: 2.5 Giga or 109 bits per Second (Gbps), OC192c (10 Gbps) and in some systems at OC768c (40 Gbps). These rates are part of a family of rates and formats available for use in optical interfaces, generally referred to as SONET, which is a standard defined by the American National Standards Institute (ANSI) of which there exists an European counterpart, mostly compatible, known as SDH (Synchronous Digital Hierarchy). Thus, at each node of a network, the data packets or cells carried on each DWDM channel must be switched, or routed, by packet-switches that process and then switch packets between different channels so as to forward them towards their final destination. Ideally, it would be desirable to keep the processing of packets in the optical domain, without conversion to electronic form, this is still not really feasible today mainly because all packet-switches need buffering that is not yet available in an optical form. So packet-switches will continue to use electronic switching technology and buffer memories for some time to come.
Because of the data rates as quoted above for individual DWDM channels (up to 40 Gbps) and the possibility of merging tenths, if not hundredths, of such channels onto a single fiber the throughput to handle at each network node can become enormous i.e., in a multi-Tera or 1012 bits per second range (Tbps) making buffering and switching, in the electronic domain, an extremely challenging task. Constant significant progress has been sustained, for decades, in the integration of more logic gates and memory bits on a single ASIC (Application Specific Integrated Circuit), allowing implementation of complex functions required to handle the data packets flowing into a node according to QoS (Quality of Service) rules. Unfortunately, the progress in speed and performance of the logic devices over time is comparatively slow, and now gated by the power one can afford to dissipate in a module to achieve it. The time to perform a random access into an affordable memory e.g., an imbedded RAM (Random Access Memory) in a standard CMOS (Complementary MOS) ASIC, is, decreasing only slowly with time while switch ports need to interface channels having their speed quadrupling at each new generation i.e., from OC48c to OC192c and to OC768c respectively from 2.5 to 10 and 40 Gbps. For example, if a memory is 512-bit wide allowing to store or fetch, in a single write or read operation, a typical fixed-size 64-byte (8-bit byte) packet of the kind handled by a packet-switch, this must be achieved in less than 10 Nano or 10−9 second (Ns) for a 40 Gbps channel and in practice in a few Ns only in order to take care of the necessary speed overhead needed to sustain the specified nominal channel performance while at least one store and one fetch i.e., two operations, are always necessary per packet movement. This represents, nowadays, the upper limit at which memories and CMOS technology can be cycled making the design of multi-Tbps-class switch extremely difficult with a cost-performance state-of-the-art technology such as CMOS, since it can only be operated at a speed comparable to the data rate of the channel they have to process.
To overcome the above mentioned technology limitation, a parallel packet switch (PPS) architecture is used. It is comprised of multiple identical lower-speed packet-switches e.g., (100) operating independently and in parallel, as sketched in FIG. 1. In each ingress port-adapter, such as (110), an incoming flow of packets (120) is spread (130), packet-by-packet, by a load balancer across the slower packet-switches, then recombined by a multiplexor (140) in the egress part of each port-adapter e.g., (150). As seen by an arriving packet, a PPS is a single-stage packet-switch that needs to have only a fraction of the performance necessary to sustain a PPS port data rate (125). If four planes are used, as shown on FIG. 1 (planes A,B,C,D), they need only to have one fourth of the performance that would otherwise be required to handle a full port data rate. More specifically, four independent switches, designed with OC192c ports, can be associated to offer OC768c port speed, provided that ingress and egress port-adapters (110, 150) are able to load balance and recombine the packets. This approach is well known in the art and sometimes referred to as ‘Inverse Multiplexing’ or ‘load balancing’. Among many publications on the subject one may e.g., refer to a paper published in Proc. ICC′92, 311.1.1-311.1.5, 1992, by T. ARAMAKI et al., untitled ‘Parallel “ATOM” Switch Architecture for High-Speed ATM Networks’ which discusses the kind of architecture considered here.
The above scheme is also attractive because of its inherent capability to support redundancy. By placing more planes than what is strictly necessary it is possible to hot replace a defective plane without having to stop traffic. When a plane is detected as being or becoming defective ingress adapter load balancers can be instructed to skip the defective plane. When all the traffic from the defective plane has been drained out it can be removed and replaced by a new one and load balancers set back to their previous mode of operation.
Thus, if PPS is really attractive to support multi-Gbps channel speeds and more particularly OC768c switch ports it remains that this approach introduces the problem of packet re-sequencing in the egress adapter. Packets from an input port (110) may possibly arrive out of sequence in a target egress adapter (150) because the various switching paths, comprised of four planes (100, 102, 104 and 106) in the example of FIG. 1, do not have the same transfer delay since they run independently and can have different buffering delays. A discussion and proposed solutions to this problem can be found, for example, in a paper by Y. C. JUNG et al., ‘Analysis of out-of-sequence problem and preventive schemes in parallel switch architecture for high-speed ATM network’, published in IEEE Proc.-Commun., Vol. 141, No. 1, February 1994. However, this paper does not consider the practical case where the switching planes have, also, to handle packets on a priority basis so as to support a Class of Service (CoS) mode of operation, a mandatory feature in all recent switches which are assumed to be capable of handling simultaneously all sorts of traffic at nodes of a single ubiquitous network handling carrier-class voice traffic as well as video distribution or just straight data file transfer. Hence, packets are processed differently by the switching planes depending on the priority tags they carry. This does no longer comply with the simple FCFS (First-Come-First-Served) rule assumed by the above referenced paper and forces egress adapters to readout packets as soon as they are ready to be delivered by the switching planes after which they can be re-sequenced on a per priority basis. Also, the above paper implicitly assumes the use of a true time stamp (TS) which means in practice that all port-adapters are synchronized so as packets from different sources are stamped from a common time reference which is a difficult and expensive requirement to meet.
Another difficulty with a PPS architecture stems from the fact that networks must not only support UC (unicast) traffic (one source to one destination) but also MC (multicast) traffic that is, traffic in which a source may have to dispatch a same incoming flow of packets to more than one destination. Video distribution and network management traffic are of this latter case (e.g., the IP suite of protocols assumes that some control packets must be broadcast). To allow a straightforward re-sequencing in each egress adapter, the simplest solution is to perform, in each ingress adapter, a numbering of the packets on the basis of their destination and priority. In which case, each egress adapter needs only to restore a continuous e.g., ascending, sequence of numbers i.e.: n, n+1, n+2, etc. from each source and for each priority. This is easily feasible for unicast traffic where there is only one destination per incoming packet i.e., one egress adapter, for each packet entering a switch. For example, if one considers a 64-port switch handling 8 priorities there are only 64 sources times 8 priorities=512 flows thus, 512 independent sequences of numbers to handle by each egress adapter, since the invention also assumes that ingress adapters need not to be synchronized.
However, in this example of a 64-port switch, there are 264-65 different combinations, times the number of priorities, of possible multicast flows from a same source. Even though not all may exist simultaneously it remains that each flow would have to be numbered separately, in sources, to keep coherency in the packet numbers received by the egress adapters. However, 264 is an impossible number to deal with as far as the implementation of the corresponding resources is concerned. Therefore, the numbering of packets on a per flow basis, is not of an easy implementation due to the huge number of possible flows of data packets to handle.
Thus there is a need for a simple mechanism in egress adapters to re-order sequences of data packets, numbered on a per flow basis in the ingress adapters, which avoids the drawback of a complex implementation.