The communications industry is rapidly changing to adjust to emerging technologies and ever increasing customer demand. This customer demand for new applications and increased performance of existing applications is driving communications network and system providers to employ networks and systems having greater speed and capacity (e.g., greater bandwidth). In trying to achieve these goals, a common approach taken by many communications providers is to use packet switching technology. Service consumers and providers demand increased performance and high availability, and thus the high-performance switches and routers used to provide these communications services are often under the double demands of being fault-tolerant as well as handling worst-case traffic patterns effectively.
A conventional approach in attempting to provide a fault tolerant system is to provision systems with duplicate resources, such as a replicated backplane or crossbar, and duplicate line cards, and then rely on a mechanism to switch the traffic from one component to its duplicate or backup component when failure of the first is detected. In one configuration, two instances of a line card are provided, with the optical connection to a port being optically split and connected to a corresponding port on the second line card. A simple binary ingress port filter blocks all traffic at the port to the second line card until it determines the first has failed. At this time, the filter is changed from block-all to block-none, allowing the second line card to quickly take over and forward the packet traffic. Ideally, the filters on the first line card can be set to block-all at this time, if it is operational enough to do so, to ensure no duplicate forwarding of the packet traffic. This fault tolerant configuration typically handles failures well, but requires a fully-duplicated line cards that provides no benefit except in the failure case, and each card must be designed to accommodate the worst-case traffic patterns that can exist.
Designing a packet switch to handle worst-case traffic can be expensive, and impractical in some settings. For example, with regards to multicast information streams, bursts of packets to addresses with large fan-out can require the switch to process a large number of packet replications for each packet, one for each output transmission required by the multicast. And, this replication is required for each multicast packet arriving at wire rate. In particular, with a fan-out of 200 for an address and packets arriving every one microsecond on a port, a line card may need to replicate the packets 200,000,000 times per second, exceeding the capabilities of most hardware. Sustained rates at this level are not generally needed, but bursts of this nature can occur, and failure to handle them generally leads to indiscriminant ingress packet drops, often harming the flows that are not significant contributors to the burst and limiting any guarantees one can make on quality of service. Large-scale switches especially suffer from this problem. The approach of providing multicast support in the crossbar increases the cost and complexity of the interconnect and also often compromises flow control within the switch, interfering with QoS properties under load, which is when they are important. The approach of passing multicast packets to a “replication” server card connected to the switch fabric does not scale well, does not handle the failure case well, and uses extra fabric bandwidth.
As a second example, minimum-sized packets and packets that are of length slightly greater than the internal transfer unit (often called “cell size”) can require memory bandwidths and transfer rates that are far in excess of average bandwidth. For example, with a 64-byte cell size, packets that are 65-bytes in length require almost twice the bandwidth to handle at full rate compared to the average 300 byte packets.
With the demand for ever higher link speeds (moving now to multiple gigabits per second) and the corresponding demand for more ports per switch and the potential increase in the use of multicast, the difficulty of handling worst-case behavior is expected to remain significant. Moreover, as individual switches handle large amount of traffic and the network becomes ever more important in enterprises and the commerce in general, fault-tolerance of these devices is critical. Needed are new methods and apparatus for providing fault tolerant packet switching systems.