1. Field of the Invention
This invention relates broadly to telecommunications. More particularly, this invention relates to link failure recovery in an ETHERNET WAN (wide area network) or MAN (metropolitan area network).
2. State of the Art
ETHERNET was developed in the 1970s as a protocol for a local area network (LAN). Since it was first developed, ETHERNET has been improved, most notably in terms of bandwidth. Typical ETHERNET transmission bandwidths are 10 Mbps, 100 Mbps, and 1,000 Mbps (1 Gbps). A typical ETHERNET LAN can be found in most any modern office. Links from individual computers and printers are run to a central location where they are each attached to an individual port of a switch or router. Every device coupled to the router or switch has a MAC (media access control) address. Data is transmitted in the payload portion of a frame which contains the source MAC address and the destination MAC address as well as routing information. A switch maintains a forwarding information data base (FIB). When the switch is first activated, it must build the FIB to associate ports with MAC addresses in the FIB. As used herein, the terms “switch” and “ETHERNET switch” include “ETHERNET switch routers” which perform layer 2 switching.
About the same time ETHERNET was being developed, a protocol known as SONET (synchronous optical network) was being developed. SONET was designed to provide high capacity trunk connections between telephone company central offices. Individual telephone connections carried in a SONET signal frame are identified by their temporal location in the frame rather than by an address in the frame header. The SONET network is often arranged as a ring from central office to central office, always returning to the office of origin. Thus, telephone connections from one central office to another can be made in either the clockwise direction or the counterclockwise direction. In this way, redundancy is built into the SONET network and if a link between two central offices fails, connections can still be made by transmitting in the opposite direction. Links can fail in several ways, either by failure of equipment in a central office or by failure of the physical link between offices. The latter type of failure may occur when a worker accidentally breaks an underground cable. It is important that the public telephone network be kept up and running at all times and that if a link fails that it be corrected quickly. The SONET network is designed to achieve that goal.
ETHERNET was not designed to automatically switch over to a redundant link in the event of a link failure. The most likely link failure in an ETHERNET LAN is that a cable is accidentally pulled out of a socket and this is easy to repair. Other possible failures include equipment failure and that is relatively easy to diagnose and repair. Unlike the public telephone network, temporary failures of a link in an ETHERNET LAN are considered acceptable.
Recently it has become desirable to connect ETHERNET LANs through a SONET WAN or an ETHERNET MAN. By connecting LANs to a WAN, nationwide businesses can provide high speed data communication among all of its offices. By connecting LANs to a MAN, LAN users can obtain very high speed access to the internet. MANs and WANs are typically not owned by the users as LANs are. MANs and WANs are usually owned by a service provider, e.g. a telephone company or internet service provider, and the users pay a monthly fee for use of the network. As such, users expect that the MAN or WAN will be available continuously and that any link failure will be corrected quickly. It is also worthwhile noting that the type of data serviced by a MAN or a WAN may be different from that serviced by a LAN. LANs typically service email, web browsing, file sharing and printing. Brief interruption of these services is tolerable. MANs and WANs may likely service video on demand, video conferencing, voice over IP, etc. These data services suffer noticeably from even brief interruptions.
International Telecommunications draft recommendation ITU-T G.803/Y.1342 includes provisions for providing redundant paths between end stations so that service can continue in the event of a component failure. Redundant paths exist in two places: the user-network interface and in a switch fabric. The effort to reduce service outages is part of a broader concept referred to as quality of service or QoS. QoS is generally a guaranteed level of service in exchange for subscriber charges. If QoS is not met, the customer will get a refund. Because different customers have different QoS requirements, the standards for ETHERNET transmission over WANs and MANs includes provisions for up to eight classes of service or CoS, though typically only four classes are implemented. The higher the CoS, the more a customer pays for service. Switches used in ETHERNET transmission over WANs and MANs maintain two databases: the FIB discussed above and a class DB which associates a CoS with each active port. When the switch is first activated, it must build the FIB and class DB to associate ports with MAC addresses in the FIB and to associate certain routing and QoS rules with ports in the class DB. When the switch receives an ETHERNET frame from a particular port, it associates the source MAC address, routing and QOS rules with the port it was received from and makes corresponding entries in the FIB. The class DB is setup by the switch operator or by layer 2 control protocols. However, at this time in the startup of the switch, there is no FIB entry for the destination address. Therefore, the switch performs “flooding” and sends copies of the received frame out on all of the ports other than the port from which the frame was received. Eventually, over time, a frame is received from every port to which devices are coupled and the databases are complete.
Several protocols have been proposed to add redundancy and fault protection to ETHERNET switches used in WANs and MANs. The Spanning Tree Protocol (STP) provides a loop free network topology by putting redundant paths in a disabled stand-by mode. These protocols also include providing two separate physical links between the customer equipment and the service provider equipment. If one of the links (or one of the ports servicing that link) fails, the equipment switches to the backup link. In order to determine when a link or port fails, periodic “keep-alive frames” are transmitted, e.g. one per second. If a keep alive frame fails to be received on time, it is assumed that the port (or link) associated with the missing frame is down and steps are taken to switch over to the redundant link. When this happens, the FIB and class DB must be updated. This can take several seconds during which time frames are lost because they continue to be sent out on a dead port (link). In order to enable rapid link failure detection, IEEE 802.3 provides for Far-End Fault Detect and Far-End Fault Generate functions for switches that do not support autonegotiation. These functions enable the detection of a far end fault within 336 microseconds which is substantially faster than waiting for a keep-alive frame. However, even with the Far-End Fault Detect and Far-End Fault Generate functions enabled, it still can take several seconds for the FIB and class DB to be updated.