This invention relates generally to computer networks and, more specifically, to a network switch having a distributed forwarding mechanism architecture for updating and synchronizing forwarding tables within the switch.
Data communication in a computer network involves the exchange of data between two or more entities interconnected by communication links and subnetworks. These entities are typically software programs executing on hardware computer platforms, such as end stations and intermediate stations. Examples of an intermediate station may be a router or switch that interconnects the communication links and subnetworks to enable transmission of data between the end stations. A local area network (LAN) is an example of a subnetwork that provides relatively short distance communication among the interconnected stations, whereas a wide area network enables long distance communication over links provided by public or private telecommunications facilities. Accordingly, the switch may be utilized to provide a xe2x80x9cswitchingxe2x80x9d function for transferring information between, e.g., LANs.
Communication software executing on the end stations correlate and manage data communication with other end stations. The stations typically communicate by exchanging discrete packets or frames of data according to predefined protocols. In this context, a protocol consists of a set of rules defining how the stations interact with each other. In addition, network routing software executing on the routers allows expansion of communication to other end stations. Collectively, these hardware and software components comprise a communications network and their interconnections are defined by an underlying architecture.
Modem communications network architectures are typically organized as a series of hardware and software levels or xe2x80x9clayersxe2x80x9d within each station. These layers interact to format data for transfer between, e.g., a source station and a destination station communicating over the network. Predetermined services are performed on the data as it passes through each layer and the layers communicate with each other by means of the predefined protocols. The lower layers of these architectures are generally standardized and are typically implemented in hardware and firmware, whereas the higher layers are generally implemented in the form of software running on the stations attached to the network. An example of such a communications architecture is the Internet communications architecture.
The Internet architecture is represented by four layers which are termed, in ascending interfacing order, the network interface, internetwork, transport and application layers. These layers are arranged to form a protocol stack in each communicating station of the network. FIG. 1 illustrates a schematic block diagram of prior art Internet protocol stacks 125 and 175 used to transmit data between a source station 110 and a destination station 150, respectively, of a network 100. As can be seen, the stacks 125 and 175 are physically connected through a communications channel 180 at the network interface layers 120 and 160. For ease of description, the protocol stack 125 will be described.
In general, the lower layers of the communications stack provide internetworking services and the upper layers, which are the users of these services, collectively provide common network application services. The application layer 112 provides services suitable for the different types of applications using the network, while the lower network interface layer 120 accepts industry standards defining a flexible network architecture oriented to the implementation of LANS.
Specifically, the network interface layer 120 comprises physical and data link sublayers. The physical layer 126 is concerned with the actual transmission of signals across the communication channel and defines the types of cabling, plugs and connectors used in connection with the channel. The data link layer (i.e., xe2x80x9clayer 2xe2x80x9d) is responsible for transmission of data from one station to another and may be further divided into two sublayers: Logical Link Control (LLC 122) and Media Access Control (MAC 124).
The MAC sublayer 124 is primarily concerned with controlling access to the transmission medium in an orderly manner and, to that end, defines procedures by which the stations must abide in order to share the medium. In order for multiple stations to share the same medium and still uniquely identify each other, the MAC sublayer defines a hardware or data link address called a MAC address. This MAC address is unique for each station interfacing to a LAN. The LLC sublayer 122 manages communications between devices over a single link of the network.
The primary network layer protocol of the Internet architecture is the Internet protocol (IP) contained within the internetwork layer 116 (i.e., xe2x80x9clayer 3xe2x80x9d). IP is a network protocol that provides internetwork routing and that relies on transport protocols for end-to-end reliability. An example of such a transport protocol is the Transmission Control Protocol (TCP) contained within the transport layer 114 (i.e., xe2x80x9clayer 4xe2x80x9d). The term TCP/IP is commonly used to refer to the Internet architecture; the TCP/IP architecture is well-known and described in Computer Networks, 3rd Edition, by Andrew S. Tanenbaum, published by Prentice-Hall (1996).
A router is an intelligent intermediate node that implements network services such as route processing, path determination and path switching functions. The route processing function allows a router to determine the type of routing needed for a packet, whereas the path switching function allows a router to accept a packet on one interface and forward it on a second interface. The path determination function enables the router to select the most appropriate interface for forwarding a packet. A switch, on the other hand, provides the basic functions of a bridge including filtering of data traffic by MAC address, xe2x80x9cleaningxe2x80x9d of a MAC address based upon a source MAC address of a frame and forwarding of the frame based upon a destination MAC address. In addition, the switch provides the path switching capability of a router.
FIG. 2 is a highly schematic block diagram of a conventional bus-based network switch 200 comprising a plurality of ports (P) coupled to forwarding engine circuitry (FE) via a bus 210. The ports may be implemented on various line cards (LC) of the switch, while the forwarding engine may be located on a separate supervisor card (SC). Broadly stated, when a frame is received at a port of the network switch, it is driven over the bus to all of the ports as a forwarding decision is rendered by the forwarding engine. The forwarding engine renders the forwarding decision by, inter alia, accessing a forwarding table (FwdT) to xe2x80x9clook-upxe2x80x9d a destination MAC address of the frame. If the destination MAC address is in the table, the forwarding decision is passed to all of the ports and only those ports selected by the decision receive the frame, while all of the other ports discard the frame. An example of such a bus-based network switch is disclosed in U.S. Pat. No. 5,796,732 to Mazzola for an Architecture for an Expandable Transaction-Based Switching Bus, which patent is hereby incorporated by reference as though fully set forth herein.
In addition to rendering the forwarding decision, the forwarding engine may then search the forwarding table for a source MAC address of the frame and if that address is not in the table, the forwarding engine xe2x80x9clearnsxe2x80x9d that address. For example, if the source MAC address of the incoming frame is A and that address is not in the forwarding table, the forwarding engine learns the source address of that frame in a conventional manner. When a subsequent frame is received at the switch from another source B which has a destination address of A, the forwarding engine may then be able to properly forward that frame to the destination.
The performance of such a bus-based network switch may be improved by incorporating the ports and forwarding engine circuitry within a line card. A plurality of line cards may then be interconnected by a conventional switch fabric to provide a line card-based network switch. FIG. 3 is a schematic block diagram of a network switch 300 comprising a plurality of line cards (LC1-3) interconnected by a switch fabric 350. It is desirable to have the network switch 300 behave similarly to that of the network switch 200, but operate substantially faster. Such behavior includes learning the source MAC address of the frame received at a port of a line card and ensuring that a frame received at any other port in the network switch may be properly forwarded throughout the switch based on the previously learned source MAC address. Performance improvement of the switch is accomplished, in part, by providing distributed forwarding tables to the line cards of the network switch; however, such an arrangement results in inherently inaccurate forwarding decision behavior.
Assume that an incoming frame is received at port 0 (P0) on line card 1 (LC 1) from source station A and is destined to station B attached to port 1 (P1) on LC1. Here, the location of station B has been learned by the forwarding engine of line card 1 (FE1) and stored in its forwarding table (FwdT1); for example, station B is represented in an entry of FwdT1 as B:1,1. The incoming frame from station A is then forwarded to P1 on LC1 in accordance with a forwarding decision rendered by FE1 and is transmitted to station B. The FE1 also learns the location of station A and stores that location in FwdT1 as A:1,0. As a result of the forwarding decision process, the frame received from station A is transmitted to station B solely within LC1; that is, the frame does not pass through the switching fabric to any other line card of the network switch, thereby increasing performance.
Assume now that an incoming frame is received at P2 of line card 2 (LC2) from a station C and is destined for station A attached to P0 of LC1. The MAC address of A was learned by FE1 during the previous forwarding decision operation; however, the forwarding engine of line card 2 (FE2) never processed (xe2x80x9csawxe2x80x9d) the frame from station A and thus has not learned the location of A. Accordingly, FE2 xe2x80x9cfloodsxe2x80x9d the frame from station C over the switch fabric to all line cards throughout the network switch. This situation manifests a problem within a distributed forwarding table architecture; namely, the fact that the distributed forwarding tables may not have the same information, and thus are not synchronized, because they do not see the same frame traffic throughout the switch.
One way to synchronize distributed forwarding tables in such a network switch is through the use of software executing on a microprocessor (xcexcp) on each line card. In this approach, the microprocessor is notified each time the forwarding engine learns a new address; the microprocessor then notifies its peer microprocessor on each line card (e.g., over an independent control bus) such that each processor can populate its associated forwarding table with the learned information. A problem with this approach involves the latency associated with updating each of the distributed forwarding tables, along with the additional overhead consumed by the microprocessors when communicating among themselves to populate their forwarding tables with the updated information. The present invention is generally directed to a technique for efficiently and quickly synchronizing the distributed forwarding tables of forwarding engines contained within line cards of a network switch and, further, for maintaining such synchronization in a dynamic (e.g., changing of stations attached to the ports) or lossy (e.g., dropping of packets in the switch fabric) configuration.
The discussion above primarily involves layer 2 (L2) forwarding decision operations; the present invention is also directed to xe2x80x9chigher-layerxe2x80x9d forwarding/routing operations and, in particular, layer 3 (L3) shortcut and layer 4 (L4) forwarding operations. FIG. 4 is a schematic block diagram of a network switch 400 coupled to a router 450 via port R and to end stations A and B via ports A and B, respectively. Assume end station A is on a different subnetwork (e.g., subnet A) than end station B (e.g., subnet B) and that the two end stations want to communicate; assume further that the ports are configured as virtual local area networks (VLANs), each of which corresponds to the different subnet. In VLAN compatible networks, various LANs, end stations or communication links may be virtually segregated into a series of network groups by associating switch or other device ports with various VLAN designations. Suitable VLAN arrangements are described in the IEEE standard 802.1Q for Virtual Bridged Local Area Networks and in U.S. Pat. No. 5,394,402 to Ross for a Hub for Segmented Virtual Local Area Network with Shared Media Access.
End station A sends a first frame to the network switch 400 where, in response to a forwarding decision, the frame is forwarded to the router 450. The router performs a L3 or L4 forwarding operation on the frame that includes rewriting the MAC (L2) header of the frame and thereafter xe2x80x9croutingxe2x80x9d the frame onto a different VLAN or subnet to destination station B. In accordance with the L3 shortcut technique, the switch observes the flow of the frame to and from the router and learns the L3 flow information associated with the frame (which does not change during the routing operation) as the frame flows to the router, while also learning the new MAC header associated with frame (which changes after the routing operation) as the routed frame flows from the router.
Specifically, the switch observes the transformation of the frame/packet passed up a protocol stack (such as stack 125) from the data link (L2) layer to the internetwork (L3) layer of the router, where a routing decision is rendered using, e.g., an IP destination network address in accordance with the IP network protocol, and coming back down the stack so as to acquire sufficient information to route the frame. The switch records (xe2x80x9clearnsxe2x80x9d) the IP logical addresses and other information provided to the router within the L3 header of the frame for storage in a L3 entry of its forwarding table, and subsequently learns the route by essentially comparing the L2 information contained in the routed frame with the information stored in the original L2 header of the frame, and noting the differences.
Thereafter, frames of the same type are not passed to the router. That is, a subsequent frame issued by end station A is examined by the switch and if it includes the learned L3 information and is destined for the router, the switch rewrites the MAC header with the learned L2 information (stored in its L3 portion of the forwarding table) from the previous frame in accordance with a L3 shortcut operation that effectively by-passes the router. Thus, L3 processing still occurs when the switch routes similar type frames from subnet A to subnet B, but that processing is implemented in hardware on the switch. An example of a shortcut technique that may be advantageously used with the present invention is described in the commonly assigned U.S. patent application Ser. No. 08/951,820, filed on Oct. 14, 1997 and titled Method and Apparatus for Implementing Forwarding Decision Shortcuts at a Network Switch by Ray Kloth et al., issued on Nov. 14, 2000 as U.S. Pat. No. 6,147,993.
The operation described above is generally the same for a L4 forwarding decision rendered by the router 450 with the exception that the resulting L4 decision is populated within a L4 entry of the forwarding table by the router software, rather than by the xe2x80x9clearningxe2x80x9d technique. Here, the router may perform a forwarding decision using information stored in a L4 header (e.g., TCP destination port number) of the first frame/packet. Yet instead of the switch 400 learning that L4 decision through the short-cut operation described above, the router 450 xe2x80x9cexplicitlyxe2x80x9d populates the L4 portion of the forwarding table with the L4 decision information. In other words, if a subsequent frame issued by an end station and destined for the router includes the relevant L4 information, the switch rewrites the MAC header with the L2 information from the previous frame (which is stored in its L4 portion of the forwarding table) in accordance with a L4 forwarding operation that effectively by-passes the router.
In FIG. 4, the router is externally coupled to the switch; however, the L3 shortcut operation described above may also apply to an embodiment of a platform wherein the router is internally connected (i.e., within the same chassis) to the network switch. Therefore, the present invention is further directed to synchronization of forwarding tables pertaining to L3/L4 operations and, in particular, to shortcuts associated with L3 switching operations and forwarding associated with L4 switching operations in a distributed network switch having a router coupled (either internally or externally) to the switch.
The invention relates to a mechanism and technique for updating and synchronizing forwarding tables contained on line cards that are interconnected by a switch fabric of a distributed network switch. The network switch is preferably a L3 or L4 switch comprising a plurality of forwarding engines distributed among the line cards. Each forwarding engine has an associated forwarding table, which preferably includes a L2 portion, a L3 portion and/or a L4 portion. The L2 portion of the table is used to execute forwarding decision operations for frames forwarded among ports of the line cards, whereas the L3/L4 portions of the table are used to execute shortcut/forwarding operations for frames routed among the ports. Broadly stated, the mechanism comprises a media access control (MAC) notification (MN) frame for updating and synchronizing the location of a destination port, i.e., the destination index (DI), stored in the L2 portions of the forwarding tables.
In the illustrative embodiment, the switch fabric is embodied as a cross-bar switch configured to interconnect a plurality of serial channel port interfaces to establish point-to-point wire connections for switching frames among the line cards of the switch. The port interfaces are used to implement an extended switching operation between the line card (i.e., the ingress card) having an incoming port that received a frame from a source station on a computer network and the line card (i.e., the egress card) having an outgoing port to which the frame is switched for delivery to a destination station of the network. The frame is preferably a fabric frame having a fabric header that includes a port-of-exit (POE) mask field, a source index field and a destination index field. The POE mask field includes a plurality of bits, one for each port interface of the switch fabric.
When a frame is received at an ingress card, the forwarding engine associated with that line card performs a forwarding decision operation using a destination MAC address of the frame. If the frame is received at the ingress card for the first time, this ingress forwarding engine also xe2x80x9clearnsxe2x80x9d a source MAC address of the frame. Learning an address comprises, inter alia, creating/updating an entry of the L2 forwarding table with the source MAC address and its location (index) within the switch. The ingress forwarding engine then performs a flood-to-fabric (FF) operation on the frame by asserting all bits in the POE mask field of the fabric frame. The asserted POE bits instruct the switch fabric to switch (xe2x80x9cfloodxe2x80x9d) copies of the fabric frame through its port interfaces to all (egress) line cards of the network switch. The FF operation essentially forces each forwarding engine associated with each egress card to either (i) update its current L2 forwarding table entry with the newly-learned source MAC address and index of the frame or, if there is not a current entry, (ii) learn the source address/index of the frame.
According to the present invention, the novel MN frame is provided to complement the FF operation. The MN frame comprises, inter alia, a destination MAC address field, a source MAC address field, a source index field and a destination index field. The MN frame may comprise either a positive MN frame or a negative MN frame, each of which involves use of a primary input (PI) indicator. The PI indicator, which may comprise either a single bit or a plurality of bits, denotes a primary input MAC address that is directly attached to a port of the line card associated with the forwarding table containing this entry. That is, the PI indicator is asserted for a forwarding table entry having a MAC address that is learned from a frame sourced through one of the ports of the line card, as opposed to being learned through the switch fabric. As described herein, the forwarding engine on the egress card issues the MN frame to the ingress card, thereby forcing the forwarding engine on that latter card to update its forwarding table with the contents of the MN frame.
The positive MN frame is employed when the PI indicator is asserted for a destination MAC address entry of the forwarding table on the egress card and the DI contained in the switched fabric frame (i.e., the ingress DI) is different from the DI stored in the egress forwarding table (i.e., the egress DI). In the illustrative embodiment, the positive MN frame may also be generated by the forwarding engine on the egress card in response to assertion of a shortcut (SC) bit in a routed frame received at that card. Assertion of the SC bit denotes that the frame was routed through the switch. In order to ensure the consistency of the forwarding tables in the switch, the egress forwarding engine notifies the ingress forwarding engine as to the location of the destination MAC address using the positive MN frame.
Upon receiving the MN frame from the egress card, the ingress forwarding engine establishes or updates an appropriate entry in the L2 portion of its forwarding table using the contents of the source MAC address and source index fields of the MN frame. Notably, the contents of the source index field reflect the port and line card originating the MN frame, i.e., the outgoing port on the egress card. Accordingly, the ingress forwarding engine uses the source index of the MN frame as the destination index for the entry created in its L2 forwarding table.
In contrast, the negative MN frame is used when the PI indicator is not asserted for a destination MAC address entry of the egress forwarding table and the ingress DI is not the same as the egress DI. That is, when the egress forwarding engine receives a frame, it performs a xe2x80x9clook-upxe2x80x9d into its forwarding table for an entry having the destination MAC address of the frame. If the PI indicator is not asserted for that entry, the egress forwarding engine xe2x80x9cknowsxe2x80x9d that the destination MAC address is not attached to a port for which it is responsible; therefore, the ingress forwarding table (i.e., the forwarding table associated with the line card from which the frame was forwarded) must have incorrect information stored therein. As a result, the egress forwarding engine generates the negative MN frame with the contents of the source index field set to xe2x80x9cfloodxe2x80x9d and sends the frame to the ingress card. Upon receipt of the negative MN frame, the ingress forwarding engine learns xe2x80x9cfloodxe2x80x9d as the destination index for the destination MAC address entry and thereafter performs a flood-to-VLAN (FV) operation for a frame having the destination MAC address.
Advantageously, the novel MN frame mechanism enables efficient and prompt synchronization of L2 forwarding tables in the distributed network switch. L2 synchronization is required to support higher layer, e.g., L3, L4 or layer 7 (application), forwarding operations that may be distributed throughout the switch.