This invention relates to various methods and apparatus which provide distributed connection-oriented services for switched data communications networks, the service provided being scalable, allowing fully active mesh topologies, reducing broadcast traffic, and enabling connections to networks and servers outside the switch domain.
Most data communications networks today rely heavily on shared-media, packet-based LAN technologies for both access and backbone connections. These networks use bridges and routers to connect multiple LANs into global internets.
A router-based, shared-media network cannot provide the high bandwidth and quality of service required by the latest networking applications and new faster workstations. For example, multimedia and full-motion video applications consume large amounts of bandwidth and require real-time delivery. Another high bandwidth application involves transmission of X-ray and other diagnostic information to enable doctors in different locations to consult while accessing the same patient information. Yet another application is xe2x80x9ccollaborativexe2x80x9d engineering, i.e., allowing multiple engineers to work on the same project simultaneously while at different geographic locations. Thus, networks once used primarily for sending text files and E-mail or sharing common databases, are now being pushed to their limits as more users push more data across them.
One way to provide additional bandwidth on a given network segment is with larger shared-media pipes, such as FDDI or Fast Ethernet; however, this does not enable the application of policy or restricted access to the enhanced network resources. Alternatively, a network can be further segmented with additional router or bridge ports; however, this increases the cost of the network and the complexity of its management and configuration.
Switched networking is a proposed solution intended to provide additional bandwidth and quality of service. In such networks, the physical routers and hubs are replaced by switches and a management system is optionally provided for monitoring the configuration of the switches. The overall goal is to provide a scalable high-performance network where all links between switches can be used concurrently for connections.
One proposal is to establish a VLAN switch domainxe2x80x94a VLAN is a xe2x80x9clogicalxe2x80x9d or xe2x80x9cvirtualxe2x80x9d LAN in which users appear to be on the same physical (or extended) LAN segment, even though they may be geographically separated. However, many VLAN implementations restrict VLAN assignments to ports, rather than end systems, which limits the effectiveness of the VLAN groupings. Other limitations of existing VLAN implementations include excessive broadcast traffic (which consume both network bandwidth and end system CPU bandwidth), disallowing transmissions out multiple ports, hop-by-hop switching determinations, and requiring multi-protocol routers to enable transmission between separate VLANs. Another problem with many VLAN switched networks is that although they allow a meshed topology, none of the redundant links can be active at the same time. Generally, the active links are determined by a spanning tree algorithm which finds one loop-free tree-based path through the network. Unfortunately, any links or nodes not in the active tree path are placed in standby.
Thus, there are numerous limitations with many prior switched communications networks.
In accordance with certain broad aspects of the present invention, methods and apparatus are provided which enable one or more of the following services:
directory (distributed discovery of MAC addresses and protocol alias addresses)
topology (distributed topology protocol exchanges among access and network switches)
broadcast resolution (resolution of broadcast frames to unicast frames at access switches)
policy (applying security restrictions prior to connection setup)
path determination (determine multiple paths from source to destination)
connection management (source-routed mapping of connections on a desired path)
call rerouting (distributed rerouting of a call when a link fails)
broadcast/unknown service (restricted flooding of nonresolvable packets)
connection-oriented switching (source-destination MAC addresses used as a connection key)
According to a first aspect of the invention, a fully distributed switching model is provided in which each switch is capable of processing all aspects of the call processing and switching functionality. Each switch maintains a local directory of locally-attached end systems on access ports. As each local end system generates MAC frames, the switch xe2x80x9clearnsxe2x80x9d the source MAC frame as well as any higher level protocol address information; these higher layer addresses are referred to as alias addresses since they alias (or rename) the MAC end system. Thus, all end system network and MAC mappings are discovered automatically at each access port on the switch.
The local directory may also store local VLAN mappings. VLAN mappings identify the logical LAN to which the switch port or user belongs. A logical or virtual LAN allows users to appear as being on the same physical (or extended) LAN segment even though they may be geographically separated. By default, all ports and users are on a xe2x80x9cdefaultxe2x80x9d or base VLAN.
More specifically, the VLAN-IDs are used only for policy and to scope broadcast/unknown destinations.
With each access switch having its own locally learned mappings in the directory, there is a xe2x80x9cvirtual directoryxe2x80x9d which provides a scalable, demand-based mechanism for distributing directory mappings through the switch domain. The virtual directory is defined as the collective directory mappings that exist in each switch within the domain. So, at all times, the virtual directory always has the complete mappings of all known users within the domain. It is not necessary to distribute or synchronize the switches"" directory between themselves. Rather, each switch may access its local directory cache for locally attached end systems, and if the end system is not found in the local directory cache, there is triggered a query to the virtual directory, i.e., to each of the remote switches local directory. This means that at any given access switch, virtual directory queries are made only for destination addresses that are not in the local directory.
The call-originating switch which cannot resolve a mapping locally within its own directory, issues a resolve to the virtual directory by xe2x80x9cVLAN ARPingxe2x80x9d. This is similar to how IP hosts resolve destination IP addresses to MAC addresses, but instead of xe2x80x9cARPingxe2x80x9d to all end systems, the VLAN resolve message is sent only to other switches within the VLAN domain. Only the switch having the local directory mapping of the requested resolve information, known as the xe2x80x9cownerxe2x80x9d switch, will respond; multiple owners may exist if an end system is redundantly connected. All resolutions are then stored as remote entries in the call-originating switches"" remote directory. The owner switch is stored along with the resolve information in the remote directory. The combination of the local directory and inter-switch resolve messaging provides mobility (i.e., end systems can attach anywhere in the network).
The directory of resolved mappings becomes in essence another cache. These entries reflect active or attempted connectivity resolutions, so the cache is self-adjusting for the source-destination traffic. It will size automatically to the actual resolution requirements of the call-originating switch.
Another important aspect of the invention is to provide topology and connection services which include the following:
distributed link state protocol
distributed path determination
distributed connection management
distributed threading the needle
distributed call rerouting.
The topology services are built into every switch, which allows each switch to be completely autonomous in its behavior, yet provides the necessary functionality across the entire switching fabric.
The switches run a distributed link state protocol. A link state protocol is used because it provides a fully-connected mesh topology (called a directed graph) that is distributed in each switch. Changes in the topology (link state changes) are event driven and are propagated through the switch fabric so that each switch rapidly converges on the current active topology graph of the network; in other words, each switch has an accurate representation of the topology at any given time. Link state changes include, but are not limited to, changes in operational status, administrative status, metrics, xe2x80x9ccostxe2x80x9d or bandwidth.
One of the key aspects of the link state protocol is that it runs completely xe2x80x9cplug-and-playxe2x80x9d out of the box, with no configuration whatsoever. The protocol is optimized to work in a xe2x80x9cflatxe2x80x9d or non-hierarchial fashion. Each switch is identified by a unique switch MAC address and links are identified by a unique link name formed by the switch MAC address concatenated with a port-instance of the link on the switch; the result is a unique xe2x80x9cswitch/port pair.xe2x80x9d This allows the protocol to know all switch nodes in the domain, as well as all of the links.
Since each call-originating switch has a topology graph, each switch can determine the xe2x80x9cbestxe2x80x9d path for the calls it originates. Although all of the switches have a topology graph, only the call-originating switch uses it to determine a complete path for a connection flow. Other switches on the connection path never do path determination. The path is defined as a sequence of xe2x80x9cswitch/port pairsxe2x80x9d which are traversed in order to get from the call-originating switch to the destination-owner switch (the destination owner switch is the switch to which the destination MAC address is locally attached). Note that the path is not determined from end system to end system; rather, it is the path connecting the ingress switch (the switch owner of the source) to the egress switch (the switch owner of the destination). Again, the topology graph only contains switch nodes and switch links; no users, end systems, networks or other forms of aggregation are known. In one embodiment, the path may comprise equal cost paths between the source and destination.
Each call-originating switch also performs connection management (making, breaking and rerouting calls) for traffic originating at its access ports. Calls are processed when there is no active connection for the source-destination MAC address on an arriving packet frame. Note that no connection management (nor call processing) is performed on network trunk ports. By having each access switch perform the connection management for the calls it is originating, the connection management is distributed around the xe2x80x9cedgesxe2x80x9d (at the access switches) of the switch fabric. This means the connection processing load is directly related to the number of access switches and traffic on access ports. Since each switch processes its own local connection requirements, it scales very well, even as the size of the switch fabric or VLAN domain grows. The total call processing rate of the fabric becomes the additive rate of all of the access switches.
One of the significant benefits of determining connections based on the source-destination MAC addresses is that it allows the switches to treat each end-to-end flow as a unique and identifiable connection (flow). In addition, it allows the switches to support a fully active mesh topology. Unlike switches that forward or filter based only on the destination MAC address, the switches of the present invention use the source and destination MAC address in each frame to forward or filter. This allows multiple paths to a particular destination from different sources through the switch fabric. This is particularly useful in client/server models, because the server is effectively a common point, to which all the clients require access. Thus, the call processing of the present invention allows multiple paths to the server from different sources through the switch fabric.
Once the packet is call processed and resolved to a unicast MAC destination, the call-originating switch determines the path of switches and links to traverse (described previously) and explicitly maps a connection on that path for the source-destination MAC address of the packet being call processed. The connection is explicitly mapped on the determined path by a xe2x80x9cthreading the needlexe2x80x9d algorithm. Threading the needle describes how the connection is xe2x80x9cthreadedxe2x80x9d through the switches on the path one switch hop at a time. The connection mapping is done by having the call-originating switch generate a source-routed connect request message which contains the source-destination MAC addresses (connection key) for the call. The path information is actually the in-order list of switch/port pairs to traverse. This message is sent explicitly hop-by-hop on the source-routed path.
As each switches processes the message, it maps a connection for the source-destination pair. The inport and outport(s) for the connection mapping can be either implicitly or explicitly described by the message (implicitly by the port the message is received on; explicitly by being a named node and link in the path). However, the connections remain disabled (e.g., outport is null) until a response is received from the last switch on the path; this response (acknowledgment) enables each switch connection as it travels back on the return path to the call-originating switch.
Another important feature is that the connection threading is self-scaling since the connect request messages are sent on the actual paths the call will be mapped on. This means that as calls are load balanced on different end-to-end paths, so are the connection management messages and processing.
Another important feature is that each switch independently handles call rerouting. This is accomplished by each switch maintaining a xe2x80x9clinks-in-usexe2x80x9d database for all connections going through the switch. As connections are mapped at each switch, the path of all the switches and links (i.e., the path information in the connect message) is maintained in a separate database. This essentially correlates links and nodes with a particular connection.
If any switch node or link changes state (e.g., failure), other switches in the fabric propagate the change as part of the link state protocol. Since each switch runs the link state protocol, it must process the node or link state change. Once it recomputes the topology graph, it searches the path database to determine if any of its active connections were using the node or link that changed state. Only if the connection was using a node or link that changed on some part of its path (e.g., a remote link on a path may have failed), the connections using that path will be unmapped. Thus, each switch having connections on a path that failed will tear down those connections traversing the path automatically, and autonomously. Connections not using that part of the path remain intact. In addition, when the call-originating switch has to tear down a call because some part of the path has a changed state in such a way as to warrant a re-route (e.g., a link failure or drastic change in cost), it will automatically recompute the path for the original call and re-establish a connection using the same technique of path determination in the connection management previously described. It is important to note that this too is completely distributed and that each switch will tear down the connections it has mapped if the path is no longer valid and each will automatically reroute calls they have originated. Since all access switches can do this, the call rerouting capability scales with the number of call-originating (access) switches in the fabric.
In another important aspect, the present invention is directed to resolving broadcast packets in order to significantly reduce the amount of broadcast traffic. This is accomplished by each switch being able to resolve broadcast packets at the switch access ports, rather than just tagging and flooding the broadcast packets. Each switch has a call processor for the major protocol families (e.g., IP and IPX) and well-known packet types.
Resolution to a non-broadcast address involves looking inside the packet and decoding the protocol layers and high level addressing and determining where the true destination of the packet is. For example, the switch looks inside an ARP broadcast packet for the target network address; then, the switch looks up in its local directory and/or the virtual directory for the MAC address bound to that network address (alias address). Thus, rather than flooding the ARP broadcast, the access switch resolves it to the true MAC destination and then establishes a connection from the ingress switch and egress switch for the source MAC address to the destination MAC address. The broadcast ARP packet is never forwarded past the access switch, and this leads to a significant reduction in broadcast traffic in the switch fabric.
In general, the only time a broadcast packet is forwarded beyond the access switch is when it cannot be resolved to a single unicast MAC address. This usually happens only with router and server advertisements.
The switches use an inter-switch control channel on which to send unresolvable (unable to switch point-to-point) packets. This control channel is formed with a single spanning tree between the switches. Rather than maintaining a separate spanning tree for each virtual LAN, only a single tree is maintained. The basis for the single tree is that only broadcast/multicast packets and unknown destinations (not heard at any access switch) need to be flooded. All other packets can be resolved to its single destination and switched/forwarded. Because each switch maintains a complete topology graph, the spanning tree is really a virtual spanning tree based on the topology graph, not on any separate protocol exchange.
Tag-based flooding is used to ensure that unresolvable broadcast packets are not flooded out all egress ports in the fabric. Because the entire VLAN domain of users (MAC addresses and VLAN mappings) is not distributed to all switches, these flooded packets must be tagged with a VLAN identifier. This tagging identifies the VLAN to which the packet belongs (usually based on the source of the frame). Essentially, the original packet is wrapped with a VLAN header which contains the source VLAN-ID. The tagging can be supported on a hardware-based switching engine or in a CPU-based switching engine and the tagged frames sent on the inter-switch control channel, using a multicast MAC address. At all egress switches, the frame is redirected from the connection engine and processed by the host agent. Here, the original packet, including its original framing, is unwrapped and transmitted out any access ports that match the VLAN-ID of the tagged frame.
Yet another aspect of the present invention allows the switched domain to co-exist and inter-network with legacy networks. Each switch incorporates xe2x80x9cvirtual router agents,xe2x80x9d which process the route and service advertisements they receive from multi-protocol routers and servers attached to the switch. The access switch summarizes and collapses the external networks, routes and services to only the xe2x80x9cbestxe2x80x9d routes. The switch can then combine the best route information for the external networks and servers, with the best path information for other switches in the switched domain, to provide a combined best path to a network or server outside the switched domain. Note that the virtual router agents do not generate or propagate any advertising packets of their own. Rather, they automatically discover remote networks and servers which generate such advertisements. The virtual router has a state machine and metrics processing to calculate the best routes. The virtual router directory in an access switch is only active when a router is attached to an access port of that switch.
These and other aspects of the present invention will be more fully described in the following detailed description and drawings.