This invention relates to network systems, and more particularly to multicast of packets in a mesh-based packet switch.
Ever-increasing demand for telephone and data communications has led to the development of higher-capacity media such as fiber optic cables. Standards have been developed to aggregate many separate telephone calls (DS0 lines) onto high-speed data backbones. One widely-used optical standard originally developed to aggregate phone calls is the Synchronous Optical NETwork (SONET) standard.
Traditional telephone switches for SONET have included Time-Division-Multiplexed (TDM) circuit switches. More recently, packet-based switches have been used to emulate such TDM switches. See the related application for xe2x80x9cAdaptive Fault-Tolerant Switching Network With Random Initial Routing and Random Routing Around Faultsxe2x80x9d, U.S. Ser. No. 09/470,144, assigned to Corvia Networks of Sunnyvale, Calif., which solves the problem of packet blocking and localized congestion by initially routing packets to a randomly-selected node within the network fabric. SONET data received by a switch is divided into packets which are sent through the switch fabric to the output (egress) port. At the egress port, the packets are re-assembled in order to form the outgoing SONET data. The switch fabric consists of store-and-forward nodes that receive packets and send them toward their destination, the egress port. The nodes are connected together in a mesh to provide many alternate routes, ensuring that node failures can be routed around.
SONET data is arranged into frames that are sent every 125 micro-seconds (xcexcsec). Since one SONET frame is divided into several packets that may be sent through the switch fabric over different routes, the latency through the switch can vary. Routing algorithms used by the nodes must be carefully selected to ensure that statistically such latencies do not exceed the frame latency of 125 xcexcsec.
While most network traffic is point-to-point (unicast), some special applications require multicast functionality. For example, video distribution requires that the packets containing the video data be replicated and sent to several different users, often through different egress ports. Other applications that use multicast include port monitoring or mirroring, protection-path routing, in-service rollover, and drop-and-continue. Port mirroring/monitoring is used for diagnostic purposes, to observe the data at another port without interfering with its forwarding. Protection-path routing is used to send duplicate data over an alternative route for enhanced reliability. In-service rollover is temporarily routing data over a duplicate new path to its destination in preparation for a permanent switchover to the new path. Drop-and-continue is a method of multicasting over a continuing network interconnection such as a ring where the data is dropped off at an intermediate node but also continues to another destination.
FIG. 1 shows multicast by replicating a packet at an ingress port. One simple approach to implement multicast is to replicate packets as they are inserted into the switch fabric at the input (ingress) port. Packet 12 is a packet received by the network switch and inserted into the switch fabric by ingress port 10. Ingress port 10 formats packet 12 for transmission through switch fabric 28, and makes several duplicate copies 14 of the re-formatted packet 12.
Each of the duplicate copies 14 contains a destination address for a different egress port 20-25. Thus the duplicate copies 14 are not exact duplicates, but do contain the same data payload as packet 12. Of course, packet 12 can itself be a portion of a larger data group, such as a row in a SONET frame. Each of the duplicate copies 14 is routed toward its destination egress port 20-25 over a different path through switch fabric 28. For example, one of the duplicate copies 14 is routed from ingress port 10 through nodes 30, 31, 32 to egress port 25, while another of the duplicate copies 14 is routed from ingress port 10 through node 35 to egress port 24. Other routes include node 33 to egress port 20, node 34 to egress port 21, and node 35 to egress ports 22, 23.
Egress ports 20-25 each receive one of the duplicate copies 14 and generate packet 16 containing the same data as packet 12. One packet 12 input to ingress port 10 is used to generate six packets 16 to six egress ports 20-25. This is known as parallel multicast, since the duplicate copies 14 pass through switch fabric 28 in parallel to each other, at about the same time.
While such parallel multicast is useful, replication of the packet at the ingress port causes a multiplication of packet traffic within switch fabric 28. In this example, six times the traffic of a single packet is produced at node 10 and at neighboring nodes creating a routing xe2x80x9chot spotxe2x80x9d of congestion. Such heavy traffic can slow the switch since several nodes must route the additional packet traffic. Other packets passing through switch fabric 28 from other ingress ports 18 can be slowed by the multicast traffic. Failures such as dropped packets can occur when packets are delayed.
Some nodes in switch fabric 28 can become congested from the multicast traffic. For example, node 35 receives three of the duplicate copies 14 from ingress port 10. Node 35 can become congested by the sudden arrival of several multicast packets. Ingress port 10 may also be locally congested, having to transmit all the duplicate copies 14.
FIG. 2 shows serial multicast by packet duplication at egress ports. Traffic from multicast can be reduced by using a serial or drop-and-continue method. Packet 12 received by ingress port 10 is not duplicated. Instead, packet 12 is sent to egress port 20 through node 33 in switch fabric 28. Once packet 12 arrives at its first destination, egress port 20, packet 12 is replicated to form packet 16 in addition to packet 12. Packet 16 is output from switch fabric 28 by egress port 20, while packet 12 is re-injected into switch fabric 28. Packet 12 then continues on to its second destination, egress port 21. Another duplicate packet 16 is made by egress port 21 for output, while packet 12 continues to the third destination, egress port 22.
A duplicate of packet 12 is made for output as packet 16 passes through each egress port 20-23. Once packet 12 arrives at its final destination, egress port 24, it is removed from switch fabric 28 and output as packet 16 by egress port 24.
Such serial multicast results in five copies of packet 12 being transmitted from egress ports 20-24 with minimal traffic increase. Local congestion from many duplicate copies of the multicast packet are avoided.
One problem with the serial multicast of FIG. 2 is latency. FIG. 3 shows serial multicast packets in SONET time frames. A delay occurs for each packet as it travels through the switch fabric. Also, a delay occurs while each egress port replicates the packet and re-injects it into the switch fabric. Since delays are cumulative, the last egress ports 23, 24 experience greater delays than do earlier egress ports 20-22.
Packet 12 arrives at ingress port 10 of FIG. 2 at arrival time TA shown on FIG. 3. Arrival time TA occurs near the beginning of a first SONET frame. After a first propagation delay TP, the packet arrives at the first egress port. The first packet is output at time TA+TP. The packet is duplicated and sent from the first egress port to the second egress port 21, which requires another propagation delay TP. Thus the second egress port outputs its packet at time TA+TP+TP. This is still within the first SONET frame.
The second egress port duplicates the packet and sends it to the third egress port, requiring another propagation delay of TP. This third egress port can output its packet at TA+TP+TP+TP. However, since a new SONET frame is marked by a synch pulse every 125 xcexcsec, this third egress port outputs its packet in the next SONET frame.
The propagation delays are not fixed, but vary with the actual path taken by the packet. Congestion at an egress port can also delay packet replication, further adding to delays. Thus each TP delay is not fixed but can vary. The second egress port""s packet may actually be transmitted during the next SONET frame rather than the current frame, depending on the actual delays. This creates synchronization problems since the packets from later egress ports may not be available in the current SONET frame. The variability of delays further complicates the problem. An error or loss of data can occur.
Fault tolerance is also a problem with serial multicast. If the packet passes through a faulty node or egress port, the packet can be lost. Downstream egress ports then do not receive the multicast packet.
What is desired is an improved multicast of packets in a packet-based switch. Accommodation of latency and variable propagation delays for serial multicast is desired. Reduced congestion and traffic for parallel multicast is also desired. Multicast for a mesh-based switch that emulates a SONET switch is desired. Avoidance of blocking and packet loss from congestion during multicast is desirable. Fault tolerance during serial multicast is desired.
A mesh-based packet switch with multicast capability has a plurality of ingress ports for receiving data and generating packets including multicast packets. A plurality of egress ports are for transmitting data from the packet switch. A switch fabric has a plurality of switching nodes each for storing and forwarding packets within the switch fabric. The switching nodes include input nodes coupled to ingress ports in the plurality of ingress ports and output nodes coupled to egress ports in the plurality of egress ports.
An ingress port injecting a multicast packet generates a multicast header to attach to the multicast packet. The multicast header includes:
a multicast flag that indicates that the multicast packet is a packet being sent to many egress ports;
a random field that stores an address of a random node within the switch fabric; and
a multicast destination identifier that indicates which egress ports to send the multicast packet to.
The multicast packet is initially routed to the random node before packet replication. The random node stores the multicast packet sent from the ingress port. The random node replicates the multicast packet to generate a plurality of unicast packets. The unicast packets each have a header that includes:
a destination field that stores a destination address of an output node coupled to an egress port identified by the multicast destination identifier of the multicast packet received by the random node.
The switching nodes route the unicast packets from the random node to the output nodes identified by the destination fields of the unicast packets generated by the random node. Thus congestion at the input node is reduced by replicating the multicast packet at the random node.
In further aspects the random node is selected at random from all the switching nodes in the switch fabric. Thus multicast packets are initially dispersed to randomly-selected nodes within the switch fabric before packet replication.
In still further aspects the random node has a lookup table that is indexed by the multicast destination identifier from the multicast packet. It stores the destination addresses written to the headers of the unicast packets generated by the random node. Thus destination addresses for the multicast packet from the ingress port are locally stored at the random node.
In other aspects the lookup table further stores second-level multicast identifiers. The second-level multicast identifiers are for indexing the lookup table to locate a second group of destination addresses. The random node further generates a second multicast packet with a second multicast header that includes a second random field storing an address of a second random node within the switch fabric. The second multicast header stores a second-level multicast identifier from the lookup table.
The second multicast packet is routed from the random node to the second random node by switching nodes in the switch fabric. The second random node replicates the second multicast packet to generate unicast packets. The unicast packets each have a header including a destination field read from a second lookup table at the second random node. Thus nested packet replication occurs at two random nodes.
In further aspects data received by the ingress port is arranged in a Synchronous Optical NETwork (SONET) frame. Packet replication by the random node and by the second random node is synchronized to the SONET frame.
In other aspects of the invention the ingress port injects a serial multicast packet into the switch fabric. The serial multicast packet has a serial header attached to the serial multicast packet by the ingress port. The serial header includes:
a multicast flag which indicates that the multicast packet is a packet being sent to many egress ports; and
a multicast destination identifier that indicates which egress ports to send the multicast packet to.
The serial multicast packet is routed by the switching nodes to a first egress port indicated by the multicast destination identifier. The first egress port replicates and outputs data from within the serial multicast packet. The first egress port sends the serial multicast packet to a second egress port indicated by the multicast destination identifier.
The second egress port replicates and outputs data from within the serial multicast packet. The second egress port sends the serial multicast packet to a third egress port indicated by the multicast destination identifier. The third egress port replicates and outputs data from within the serial multicast packet. Routing to each next port is timed synchronously to the SONET frame. Thus serial multicast packets are routed to a chain of egress ports for replication at the egress ports without risk of a timing or SONET framing failure due to the combined latency of routing to multiple ports within a single SONET frame.