This invention relates generally to computer networks and, more specifically, to a network switch having a distributed forwarding mechanism architecture for learning and switching frames within a computer network.
Data communication in a computer network involves the exchange of data between two or more entities interconnected by communication links and subnetworks. These entities are typically software programs executing on hardware computer platforms, such as end stations and intermediate stations. Examples of an intermediate station may be a router or switch that interconnects the communication links and subnetworks to enable transmission of data between the end stations. A local area network (LAN) is an example of a subnetwork that provides relatively short distance communication among the interconnected stations, whereas a wide area network enables long distance communication over links provided by public or private telecommunications facilities. Accordingly, the switch may be utilized to provide a xe2x80x9cswitchingxe2x80x9d function for transferring information between, e.g., LANs.
Communication software executing on the end stations correlate and manage data communication with other end stations. The stations typically communicate by exchanging discrete packets or frames of data according to predefined protocols. In this context, a protocol consists of a set of rules defining how the stations interact with each other. In addition, network routing software executing on the routers allows expansion of communication to other end stations. Collectively, these hardware and software components comprise a communications network and their interconnections are defined by an underlying architecture.
Modern communications network architectures are typically organized as a series of hardware and software levels or xe2x80x9clayersxe2x80x9d within each station. These layers interact to format data for transfer between, e.g., a source station and a destination station communicating over the network. Predetermined services are performed on the data as it passes through each layer and the layers communicate with each other by means of the predefined protocols. The lower layers of these architectures are generally standardized and are typically implemented in hardware and firmware, whereas the higher layers are generally implemented in the form of software running on the stations attached to the network. An example of such a communications architecture is the Internet communications architecture.
The Internet architecture is represented by four layers which are termed, in ascending interfacing order, the network interface, internetwork, transport and application layers. These layers are arranged to form a protocol stack in each communicating station of the network. FIG. 1 illustrates a schematic block diagram of prior art Internet protocol stacks 125 and 175 used to transmit data between a source station 110 and a destination station 150, respectively, of a network 100. As can be seen, the stacks 125 and 175 are physically connected through a communications channel 180 at the network interface layers 120 and 160. For ease of description, the protocol stack 125 will be described.
In general, the lower layers of the communications stack provide internetworking services and the upper layers, which are the users of these services, collectively provide common network application services. The application layer 112 provides services suitable for the different types of applications using the network, while the lower network interface layer 120 accepts industry standards defining a flexible network architecture oriented to the implementation of LANs.
Specifically, the network interface layer 120 comprises physical and data link sublayers. The physical layer 126 is concerned with the actual transmission of signals across the communication channel and defines the types of cabling, plugs and connectors used in connection with the channel. The data link layer (i.e., xe2x80x9clayer 2xe2x80x9d) is responsible for transmission of data from one station to another and may be further divided into two sublayers: Logical Link Control (LLC 122) and Media Access Control (MAC 124).
The MAC sublayer 124 is primarily concerned with controlling access to the transmission medium in an orderly manner and, to that end, defines procedures by which is the stations must abide in order to share the medium. In order for multiple stations to share the same medium and still uniquely identify each other, the MAC sublayer defines a hardware or data link address called a MAC address. This MAC address is unique for each station interfacing to a LAN. The LLC sublayer 122 manages communications between devices over a single link of the network.
The primary network layer protocol of the Internet architecture is the Internet protocol (IP) contained within the internetwork layer 116 (i.e., xe2x80x9clayer 3xe2x80x9d). IP is a network protocol that provides internetwork routing and that relies on transport protocols for end-to-end reliability. An example of such a transport protocol is the Transmission Control Protocol (TCP) contained within the transport layer 114 (i.e., xe2x80x9clayer 4xe2x80x9d). The term TCP/IP is commonly used to refer to the Internet architecture; the TCP/IP architecture is well-known and described in Computer Networks, 3rd Edition, by Andrew S. Tanenbaum, published by Prentice-Hall (1996).
A router is an intelligent intermediate node that implements network services such as route processing, path determination and path switching functions. The route processing function allows a router to determine the type of routing needed for a packet, whereas the path switching function allows a router to accept a packet on one interface and forward it on a second interface. The path determination function enables the router to select the most appropriate interface for forwarding a packet. A switch, on the other hand, provides the basic functions of a bridge including filtering of data traffic by MAC address, xe2x80x9clearningxe2x80x9d of a MAC address based upon a source MAC address of a frame and forwarding of the frame based upon a destination MAC address. In addition, the switch provides the path switching capability of a router.
FIG. 2 is a highly schematic block diagram of a conventional bus-based network switch 200 comprising a plurality of ports (P) coupled to forwarding engine circuitry (FE) via a bus 210. The ports may be implemented on various line cards (LC) of the switch, while the forwarding engine may be located on a separate supervisor card (SC). Broadly stated, when a frame is received at a port of the network switch, it is driven over the bus to all of the ports as a forwarding decision is rendered by the forwarding engine. The forwarding engine renders the forwarding decision by, inter alia, accessing a forwarding table (FwdT) to xe2x80x9clook-upxe2x80x9d a destination MAC address of the frame. If the destination MAC address is in the table, the forwarding decision is passed to all of the ports and only those ports selected by the decision receive the frame, while all of the other ports discard the frame. An example of such a bus-based network switch is disclosed in U.S. Pat. No. 5,796,732 to Mazzola for an Architecture for an Expandable Transaction-Based Switching Bus, which patent is hereby incorporated by reference as though fully set forth herein.
In addition to rendering the forwarding decision, the forwarding engine may then search the forwarding table for a source MAC address of the frame and if that address is not in the table, the forwarding engine xe2x80x9clearnsxe2x80x9d that address. For example, if the source MAC address of the incoming frame is A and that address is not in the forwarding table, the forwarding engine learns the source address of that frame in a conventional manner. When a subsequent frame is received at the switch from another source B which has a destination address of A, the forwarding engine may then be able to properly forward that frame to the destination.
The performance of such a bus-based network switch may be improved by incorporating the ports and forwarding engine circuitry within a line card. A plurality of line cards may then be interconnected by a conventional switch fabric to provide a line card-based network switch. FIG. 3 is a schematic block diagram of a network switch 300 comprising a plurality of line cards (LC1-3) interconnected by a switch fabric 350. It is desirable to have the network switch 300 behave similarly to that of the network switch 200, but operate substantially faster. Such behavior includes learning the source MAC address of the frame received at a port of a line card and ensuring that a frame received at any other port in the network switch may be properly forwarded throughout the switch based on the previously learned source MAC address. Performance improvement of the switch is accomplished, in part, by providing distributed forwarding tables to the line cards of the network switch; however, such an arrangement results in inherently inaccurate forwarding decision behavior.
Assume that an incoming frame is received at port 0 (P0) on line card 1 (LC1) from source station A and is destined to station B attached to port 1 (P1) on LC1. Here, the location of station B has been learned by the forwarding engine of line card 1 (FE1) and stored in its forwarding table (FwdT1); for example, station B is represented in an entry of FwdT1 as B:1,1. The incoming frame from station A is then forwarded to P1 on LC1 in accordance with a forwarding decision rendered by FE1 and is transmitted to station B. The FE1 also learns the location of station A and stores that location in FwdT1 as A:1,0. As a result of the forwarding decision process, the frame received from station A is transmitted to station B solely within LC1; that is, the frame does not pass through the switching fabric to any other line card of the network switch.
Assume now that an incoming frame is received at P2 of line card 2 (LC2) from a station C and is destined for station A attached to P0 of LC1. The MAC address of A was learned by FE1 during the previous forwarding decision operation; however, the forwarding engine of line card 2 (FE2) never processed (xe2x80x9csawxe2x80x9d) the frame from station A and thus has not learned the location of A. Accordingly, FE2 xe2x80x9cfloodsxe2x80x9d the frame from station C over the switch fabric to all line cards throughout the network switch. This situation manifests a problem within a distributed forwarding table architecture; namely, the fact that the distributed forwarding tables may not have the same information, and thus are not synchronized, because they do not see the same frame traffic throughout the switch.
One way to synchronize distributed forwarding tables in such a network switch is through the use of software executing on a microprocessor (xcexcp) on each line card. In this approach, the microprocessor is notified each time the forwarding engine learns a new address; the microprocessor then notifies its peer microprocessor on each line card (e.g, over an independent control bus) such that each processor can populate its associated forwarding table with the learned information. A problem with this approach involves the latency associated with updating each of the distributed forwarding tables, along with the additional overhead consumed by the microprocessors when communicating among themselves to populate their forwarding tables with the updated information. The present invention is generally directed to a technique for efficiently and quickly synchronizing the distributed forwarding tables of forwarding engines contained within line cards of a network switch.
The discussion above primarily involves layer 2 (L2) forwarding decision operations; the present invention is also directed to layer 3 (L3) routing operations and, in particular, L3 shortcuts associated with routing operations. FIG. 4 is a schematic block diagram of a network switch 400 coupled to a router 450 via port R and to end stations A and B via ports A and B, respectively. Assume end station A is on a different subnetwork (e.g., subnet A) than end station B (e.g., subnet B) and that the two end stations want to communicate; assume further that the ports are configured as virtual local area networks (VLANs), each of which corresponds to the different subnet. In VLAN compatible networks, various LANs, end stations or communication links may be virtually segregated into a series of network groups by associating switch or other device ports with various VLAN designations. Suitable VLAN arrangements are described in the IEEE standard 802.1Q for Virtual Bridged Local Area Networks and in U.S. Pat. No. 5,394,402 to Ross for a Hub for Segmented Virtual Local Area Network with Shared Media Access.
End station A sends a first frame to the network switch 400 where, in response to a forwarding decision, the frame is forwarded to the router 450. The router performs a routing operation on the frame that includes, among other things, rewriting the MAC (L2) header of the frame and thereafter xe2x80x9croutingxe2x80x9d the frame onto a different VLAN or subnet to destination station B. In accordance with the shortcut technique, the switch observes the flow of the frame to and from the router and learns the L3 flow information associated with the frame (which does not change during the routing operation) as the frame flows to the router, while also learning the new MAC header associated with frame (which changes after the routing operation) as the routed frame flows from the router.
Specifically, the switch observes the transformation of the frame/packet passed up a protocol stack (such as stack 125) from the data link (L2) layer to the internetwork (L3) layer of the router, where a routing decision is rendered using, e.g., the IP network protocol, and coming back down the stack so as to acquire sufficient information to route the frame. The switch records (xe2x80x9clearnsxe2x80x9d) the logical addresses and other information provided to the router within the L3 header of the frame, and subsequently learns the route by essentially comparing the L2 information contained in the routed frame with the information stored in the original L2 header of the frame, and noting the differences.
Thereafter, frames of the same type are not passed to the router. That is, a subsequent frame issued by end station A is examined by the switch and if it includes the learned L3 information and is destined for the router, the switch rewrites the MAC header with the learned L2 information from the previous frame (and changes the VLAN) in accordance with a L3 shortcut operation that effectively by-passes the router. Thus, L3 processing still occurs when the switch routes similar type frames from subnet A to subnet B, but that processing is implemented in hardware on the switch. An example of a shortcut technique that may be advantageously used with the present invention is described and copending and commonly assigned U.S. patent application Ser. No. 08/951,820, filed on Oct. 14, 1997 and titled Method and Apparatus for Implementing Forwarding Decision Shortcuts at a Network Switch by Ray Kloth et al., which application is hereby incorporated by reference.
In FIG. 4, the router is externally coupled to the switch; however, the L3 shortcut operation described above may also apply to an embodiment of a platform wherein the router is internally connected (i.e, within the same chassis) to the network switch. Therefore, the present invention is further directed to synchronization of forwarding tables pertaining to L3 operations and, in particular, to shortcuts associated with L3 switching operations in a distributed network switch having a router coupled (either internally or externally) to the switch.
The invention relates to a technique for learning and switching frames between line cards that are interconnected by a switch fabric of a distributed network switch. The network switch comprises a software routing component (xe2x80x9crouterxe2x80x9d) and a plurality of hardware components (xe2x80x9cforwarding enginesxe2x80x9d), the latter being distributed among the line cards; one of the line cards is a switch management card (SMC) that also contains the router. Each forwarding engine has an associated forwarding table, which preferably includes a L2 portion and a L3 portion. The L2 portion of the table is used to execute forwarding decision operations for frames forwarded among ports of the line cards, whereas the L3 portion of the table is used to execute shortcut operations for frames routed among the ports.
According to a first aspect of the inventive technique, the router modifies a header of a frame during execution of a routing decision operation to ensure that copies of that frame are provided to the line card (i.e., the ingress card) having an incoming port that received the frame from a source station on a computer network, in addition to the line card (i.e., the egress card) having an outgoing port to which the frame is switched for delivery to a destination station of the network. The frame is preferably a fabric frame having a fabric header that includes a port-of-exit (POE) mask field, a source index field and a destination index field. The POE mask field includes a plurality of bits, one for each port interface of the switch fabric.
Specifically, the router asserts a bit in the POE mask field of the fabric header that specifies the port interface on the switch fabric corresponding to the ingress card (as specified by the contents of the source index field). The forwarding engine on the SMC then performs a forwarding decision operation using a destination media access control (MAC) address of the frame, which results in assertion of a bit in a POE vector that specifies the port interface corresponding to the egress card (as specified by the contents of the destination index field). The asserted bit in the POE mask field is logically combined with the asserted bit of the POE vector to instruct the switch fabric to xe2x80x9cswitchxe2x80x9d copies of the routed frame through its port interfaces coupled to the ingress and egress cards. The copy of the routed frame provided to the ingress card ensures that the forwarding engine on that card xe2x80x9cseesxe2x80x9d the frame before and after the routing decision is rendered by the router so that it may learn and correctly update its L3 forwarding table.
According to another aspect of the present invention, when the router performs a routing decision operation on a fabric frame, it does not modify the contents of the source index field of that frame. That is, rather than altering the header of the routed frame to indicate that the frame originated from the SMC (as is typical during routing operations), the router maintains the contents of the source index field to specify the incoming port on the ingress card as the originator of the routed frame. This feature of the invention ensures that the egress card can generate a notification frame directed back to the source (ingress) line card, as described further herein.
If the ports of the ingress card are not in the broadcast domain of the routed frame, the location of the shortcut may not be known to (xe2x80x9cstored inxe2x80x9d) the L2 portion of the forwarding table on the ingress card. According to yet another aspect of the invention, the router also asserts a predefined bit in the fabric header of the routed frame that instructs the forwarding engine on the egress card to generate the notification frame that informs a recipient of that frame about the location of a particular L2 (shortcut) address. In the illustrative embodiment described herein, the predefined bit is a shortcut (SC) bit and the notification frame is a MAC notification (MN) frame.
The MN frame preferably comprises, inter alia, a destination index field, a source index field, a destination MAC address field and a source MAC address field. Since the MN frame is generated in response to assertion of the SC bit in the routed frame, the contents of the destination index field associated with the destination MAC address reflect the port and line card originating the routed frame, i.e., the incoming port on the ingress card; accordingly, the MN frame is issued from the egress card to the ingress card. Upon receiving the MN frame, the forwarding engine on the ingress card establishes an appropriate entry in the L2 portion of its forwarding table using the contents of the source MAC address and source index fields of the MN frame, the latter of which reflect the port and line card originating the MN frame, i.e., the outgoing port on the egress card.
According to still another aspect of the present invention, the forwarding engine on the ingress card also marks the established entry as ineligible for normal L2 aging policies. Broadly stated, a MAC address entry that has not been refreshed as a source within a specified period of time is removed from the L2 portion of the forwarding table in connection with a conventional aging policy. However, the MAC address associated with the source index learned by forwarding engine of the ingress card may never be a source of a frame received at the ingress card. Accordingly, the aging policy will eventually age-out the entry associated with that MAC address which, in turn, will inhibit normal forwarding (i.e., non-flooding) of a frame at the ingress card. The inventive learning and switching technique provides a means of marking such an entry so that it is not aged according to the conventional aging policy.
In summary, the distributed learning and switching technique comprises the following features: (1) configuring the router of the network switch to issue an extra copy of a routed frame to the ingress card; (2) configuring the router to ensure that the contents of the source index field of the routed frame header indicate that the frame originated from the ingress card; (3) configuring the router to assert the SC bit in the header of the routed frame; (4) in response to the asserted SC bit, configuring the forwarding engine on the egress card to generate and issue the MN frame to the ingress card; and (5) configuring the forwarding engine on the ingress card to mark an established entry in the L2 portion of its forwarding table as ineligible for normal L2 aging policies.
Advantageously, the inventive learning and switching technique enables distribution of shortcut operations among the forwarding engines of the line cards, thereby achieving optimal use of the forwarding engines on the network switch. To that end, the present invention ensures the establishment of an L3 shortcut entry in the L3 portion of an ingress card""s forwarding table to substantially reduce (i) the latency involved with switching a routed frame in the network switch and (ii) the load on the switching mechanism (forwarding engine) of the SMC. Moreover, the inventive technique also ensures that an L2 entry for the destination of the shortcut is established in the L2 portion of the ingress card""s forwarding table, even though this line card may not directly see any traffic from the destination.