This invention relates generally to computer networks and, more specifically, to routing of multicast packets within a computer network using a switch.
Data communication in a computer network involves the exchange of data between two or more entities interconnected by communication links and subnetworks. These entities are typically software programs executing on hardware computer platforms, such as end stations and intermediate stations. Examples of an intermediate station may include a router, bridge or switch which interconnect the communication links and subnetworks to enable transmission of data between the end stations. A local area network (LAN) is an example of a subnetwork that provides relatively short distance communication among the interconnected stations, whereas a wide area network (WAN) enables long distance communication over links provided by public or private telecommunications facilities.
Communication software executing on the stations correlate and manage data communication with other stations. The stations typically communicate by exchanging discrete packets or frames of data according to predefined protocols. In this context, a protocol consists of a set of rules defining how the stations interact with each other. In addition, network routing software executing on the routers allow expansion of communication to other end stations. Collectively, these hardware and software components comprise a communications network and their interconnections are defined by an under-lying architecture.
Modern communications network architectures are typically organized as a series of hardware and software levels or xe2x80x9clayersxe2x80x9d within each station. These layers interact to format data for transfer between, e g., a source station and a destination station communicating over the network. Predetermined services are performed on the data as it passes through each layer and the layers communicate with each other by means of the predefined protocols. The lower layers of these architectures are generally standardized and are typically implemented in hardware and firmware, whereas the higher layers are generally implemented in the form of software running on the stations attached to the network. An example of such a communications architecture is the Internet communications architecture.
The Internet architecture is represented by four layers which are termed, in ascending interfacing order, the network interface, internetwork, transport and application layers. These layers are arranged to form a protocol stack in each communicating station of the network. FIG. 1 illustrates a schematic block diagram of prior art Internet protocol stacks 125 and 175 used to transmit data between a source station 110 and a destination station 150, respectively, of a network 100. As can be seen, the stacks 125 and 175 are physically connected through a communications medium 180 at the network interface layers 120 and 160. For ease of description, the protocol stack 125 will be described.
In general, the lower layers of the communications stack provide internetworking services and the upper layers, which are the users of these services, collectively provide common network application services. The application layer 112 provides services suitable for the different types of applications using the network, while the lower network interface layer 120 accepts industry standards defining a flexible network architecture oriented to the implementation of LANs.
Specifically, the network interface layer 120 comprises physical and data link sublayers. The physical layer 126 is concerned with the actual transmission of signals across the communication medium and defines the types of cabling, plugs and connectors used in connection with the medium. The data link layer (i.e., xe2x80x9clayer 2xe2x80x9d) is responsible for transmission of data from one station to another and may be further divided into two sublayers: Logical Link Control (LLC 122) and Media Access Control (MAC 124).
The MAC sublayer 124 is primarily concerned with controlling access to the transmission medium in an orderly manner and, to that end, defines procedures by which the stations must abide in order to share the medium. In order for multiple stations to share the same medium and still uniquely identify each other, the MAC sublayer defines a hardware or data link address called a MAC address. This MAC address is unique for each station interfacing to a LAN. The LLC sublayer 122 manages communications between devices over a single link of the network.
The primary network layer protocol of the Internet architecture is the Internet protocol (IP) contained within the inter network layer 116. IP is a network protocol that provides internetwork routing and relies on transport protocols for end-to-end reliability. An example of such a transport protocol is the Transmission Control Protocol (TCP) contained within the transport layer 114. The term TCP/IP is commonly used to refer to the Internet architecture. Protocol stacks and the TCP/IP reference model are well-known and are, for example, described in Computer Networks by Andrew S. Tannenbaum, printed by Prentice Hall PTR, Upper Saddle River, N.J., 1996.
Data transmission over the network 100 therefore consists of generating data in, e.g., sending process 104 executing on the source station 110, passing that data to the application layer 112 and down through the layers of the protocol stack 125, where the data are sequentially formatted as a frame for delivery onto the medium 180 as bits. Those frame bits are then transmitted over an established connection of medium 180 to the protocol stack 175 of the destination station 150 where they are passed up that stack to a receiving process 174. Data flow is schematically illustrated by solid arrows.
Although actual data transmission occurs vertically through the stacks, each layer is programmed as though such transmission were horizontal. That is, each layer in the source station 110 is programmed to transmit data to its corresponding layer in the destination station 150, as schematically shown by dotted arrows. To achieve this effect, each layer of the protocol stack 125 in the source station 110 typically adds information (in the form of a header) to the data generated by the sending process as the data descends the stack.
For example, the internetwork layer encapsulates data presented to it by the transport layer within a packet having a network layer header. The network layer header contains, among other information, source and destination (logical) IP network addresses needed to complete the data transfer. The data link layer, in turn, encapsulates the packet in a frame, such as a conventional Ethernet frame, that includes a data link layer header containing information required to complete the data link functions, such as (physical) MAC addresses. At the destination station 150, these encapsulated headers are stripped off one-by-one as the frame propagates up the layers of the stack 175 until it arrives at the receiving process.
A router is an intelligent intermediate node that implements network services such as router processing, path determination and path switching functions. The router also provides interfaces for a wide range of communication links and subnetworks. The route processing function allows a router to determine the type of routing needed for a packet, whereas the path switching function allows a router to accept a packet on one interface and forward it on another interface. The path determination, or forwarding decision, function enables the router to select the most appropriate interface for forwarding the packet.
A switch provides the basic functions of a bridge including filtering of data traffic by MAC address, xe2x80x9clearningxe2x80x9d of a MAC address based upon a source MAC address of a, frame and (xe2x80x9cbridgingxe2x80x9d) forwarding of the frame among its ports based upon a destination MAC address. In addition, the switch provides the path switching capability of a router. Path switching is typically separated from the forwarding decision processing of a router to enable high-speed, interface-level xe2x80x9cswitchingxe2x80x9d at the ports of the switch.
U.S. Pat. No. 5,394,402 issued on Feb. 28, 1995 to Floyd E. Ross (the xe2x80x9c402 patentxe2x80x9d) discloses an arrangement that is capable of associating any port of the switch with any particular segregated network group. According to the ""402 patent, any number of physical ports of the switch may be associated with any number of groups within the switch by using a virtual local area network (VLAN) arrangement that virtually associates the port with a particular VLAN designation. Specifically, Ross discloses a switch or hub for a segmented virtual local area network with shared media access that associates VLAN designations with at least one internal port and further associates those VLAN designations with messages transmitted from any of the ports to which the VLAN designation has been assigned.
The VLAN designation assigned (e.g., programmed) to each internal port is stored in a memory portion of the switch such that every time a message is received by the switch on an internal port, the VLAN designation of that port is assigned with the message. Association is accomplished by a flow processing element which looks up the VLAN designation in a memory based on the internal port where the message originated. In addition to the ""402 patent, an IEEE standards committee is proposing a standard for virtual bridge to local area networks (see IEEE standard 802.1 q).
An objective of the VLAN arrangement described in Ross is to allow all ports and entities of the network having the same VLAN designation to exchange messages by associating a VLAN designation with each message. Those entities having the same VLAN designations function as if they are all part of the same LAN. Each VLAN may be further associated with a subnetwork (xe2x80x9csubnetxe2x80x9d) to provide an organizational overlay to the network that facilitates transmission of data between a group of end stations.
In many cases, the destination of a data frame issued by a source station (xe2x80x9csenderxe2x80x9d) may be more than one, but less than all of the entities (xe2x80x9creceiversxe2x80x9d) on the network; this type of multicast data transfer may further be employed to segregate communication between groups of receivers on the network. IP multicasting, in particular, may be used to disseminate data to a multicast group of receivers on different subnets, but within a single multicast domain of the network. A router interconnects the subnets and executes multicast routing protocols to allow expansion of communication to the end stations of the multicast domain on other subnets.
An example of such a multicast routing protocol is the Protocol Independent Multicast (PIM) protocol used to propagate routing information among routers in a multicast domain. The routers within a multicast domain calculate optimal routes from a sender to receivers of the multicast group and then exchange that information with their neighboring routers to establish paths from the sender to those receivers for multicast traffic. The PIM multicast routing protocol, which is described in detail in Request For Comments (RFC) 2362, defines the interaction between participating routers to create and maintain a multicast distribution tree.
To effect IP multicasting, a sending process generally specifies a destination IP address that is a multicast address for the frame. The multicast destination.IP address is typically a Class D multicast address and the (group) destination MAC address of the frame is directly mapped from that multicast address by, the sending process when generating the frame. Receiving processes typically notify their internetwork layers that they want to receive frames destined for the multicast address; this is called xe2x80x9cjoiningxe2x80x9d a multicast group. These receiving members then xe2x80x9clistenxe2x80x9d on the multicast address and, when a multicast data frame is received at a receiver, it delivers a copy of the data to each process that belongs to the group. An example of a protocol used by an IP host (sending or receiving process of an end station) to report its multicast group membership to an immediately-neighboring multicast router is the Internet Group Management Protocol (IGMP) described in Request for Comments (RFC) 2236. Upon receiving the membership information from a station, the router executes the PIM routing protocol to communicate that membership information to its neighboring routers in the multicast domain.
In addition to performing route processing functions in connection with multicast protocols, the router also executes path determination and switching (xe2x80x9cforwardingxe2x80x9d) operations on the multicast data traffic received on its interfaces. Multicast forwarding operations involve, inter alia, replication of a data packet onto each outgoing router interface having a receiver of the packet. A problem associated with performing such replication operations at the router involves scaleability; that is, the amount of IP multicast traffic that a router can replicate based on its limited resources. In this context, the resources involve the capacity of a processing entity (e.g., a route processor executing layer 3 software processes) within the router to perform replication, in addition to route processing and path determination operations, on incoming packets at substantially high rates.
Specifically if the incoming packet; rate for a multicast traffic flow is higher than the rate at which the route processor can process (e.g., route, replicate and forward) the packets, then subsequent incoming multicast traffic will be dropped. For example, assume a route processor can process incoming packets at a rate of 20,000 packets per second (pps); assume also, however, that the router is receiving multicast packets from multiple sources at an aggregate incoming rate that is greater than 20,000 pps. This situation may arise with a xe2x80x9cbackbonexe2x80x9d router configured to route traffic from many different subnets in an enterprise network. In this case, the route processor quickly becomes overloaded and drops those packets that exceed its 20,000 pps processing rate, thereby creating a xe2x80x9cbottleneckxe2x80x9d in the network. The present invention is generally directed to solving the problem associated with the amount of multicast traffic a router can forward; in particular, the invention is directed to a technique for offloading multicast forwarding operations from a router to a switch.
The invention comprises a technique for implementing forwarding operation xe2x80x9cshortcutsxe2x80x9d at a switch for multicast data traffic routed between subnetworks of a computer network. Broadly stated, a first multicast frame is forwarded from the switch to a router, which performs route processing and forwarding operations for a packet encapsulated within the frame. During execution of the route processing operation, the router provides multicast flow and additional information relating to the routed packet to the switch in accordance with a novel multicast shortcut control protocol (MSCP). The information is used by the switch to implement multicast shortcuts for subsequent frames received at the switch having the multicast packet, flow.
In one aspect of the invention, various hardware components in the switch are configured to perform route lookup, packet rewrite and replication operations offloaded from the router. These hardware components include a layer 2 (L2) forwarding engine,.a layer 3 (L3) forwarding engine and a rewrite/replication engine. Each engine is further associated with a respective data structure, such as a L2 forwarding table, a L3 shortcut table and a multicast expansion table (MET) used to implement the multicast shortcut technique. In another aspect of the invention, the MSCP enables communication between the router (software) and engines (hardware) to, for example, program the table structures. Offloading of the forwarding operations in accordance with the multicast shortcut technique advantageously allows the router to utilize its processing resources for other processing-intensive applications.
During execution of the route processing operation, a multicast shortcut server (MSS) process on the router communicates with a multicast shortcut client (MSC) process on the switch using MSCP to encode and transmit shortcut information between the router and switch. Specifically, the MSS sends a shortcut control message (SCCM) to the MSC which uses the contents of the message to program the table structures of the switch and establish a hardware shortcut for a multicast flow defined in the packet. Thereafter to access the hardware shortcut, three components of the packet are used: an Internet protocol (IP) source address, an IP destination address and an incoming virtual local area network (VLAN) identifier (ID). These three components are preferably hashed to implement a reverse path forwarding (RPF) check in hardware.
In summary, the MSC (i) allocates memory in the MET of the switch; (ii) creates a return MET pointer to that allocated memory, (iii) programs an entry of the L3 shortcut table with information contained within the SCCM message; (iv) stores the MET pointer in a predefined field of the L3 entry; and (v) accesses a corresponding entry of the L2 forwarding table using information contained in the SCCM message.
After programming the tables to establish the hardware shortcut for the IP multicast flow, the MSC responds to the MSS with an acknowledgment message. In response to receiving a positive acknowledgement, the MSS (router) terminates forwarding operations on packets associated with the multicast flow. In addition, the MSS sends a Multicast Fast Drop (MFD) message to the MSC instructing the switch to block all multicast flow (packet) traffic from reaching the router. Upon receiving the MFD message, the MSC locates the shortcut entry in the L3 table, accesses the associated L2 forwarding table entry and reprograms an index in that entry to eliminate a port-select signal for the router, thereby completing the shortcut setup process. Forwarding operations for subsequent frames having the multicast flow are then rendered by hardware logic circuits of the switch rather than by the router.