This invention relates in general to computer networks, and in particular, to a method and system for implementing network redundancy in a paging network.
Communication networks are well-known in the computer communications field. By definition, a network is a group of computers and associated devices that are connected by communications facilities or links. Networks may vary in size, from a local area network (LAN) consisting of a few computers or workstations and related devices, to a wide area network (WAN) which interconnects computers and LANs that are geographically dispersed. A LAN is sometimes defined as a subnet. Where a subnet is a portion of a network that may be a physically independent network segment, which shares a network address with other portions of the network and is distinguished by a subnet address.
An internetwork, in turn, is the joining of multiple LANs or subnets, both similar and dissimilar, by means of gateways or routers that facilitate data transfer and conversion from various networks. A representative section of a network 10 is shown in FIG. 1 (Prior Art) in which a plurality of LANs 11 and a WAN 12 are interconnected by routers 13. The routers 13 are generally known as special purpose computers used to interface one LAN or WAN to another.
Typical communication networks also have a common network architecture based upon the Open Systems Interconnection (OSI) Reference Model in order to provide communication between a multiplicity of interconnected computing devices or xe2x80x9cnodes.xe2x80x9d The OSI Reference Model segments networking protocols into seven layers, which are listed, in ascending order of abstraction as: (1) the physical layer, (2) the data link layer, (3) the network layer, (4) the transport layer, (5) the session layer, (6) the presentation layer, and (7) the application layer.
Internetworking devices such as repeaters, bridges, and routers operate at a different layer of the OSI Reference Model. Repeaters, also known as xe2x80x9cconcentratorsxe2x80x9d and xe2x80x9chubs,xe2x80x9d operate at the physical layer, which provides the electrical and mechanical interface to the physical medium of the network. All network computing devices, such as personal computers and workstations, also include network interface cards (NICs) to connect the computing device to the network at the physical layer. Finally, routers operate at the network layer, which initiates and terminates network connections and manages routing, data sequencing, and error detection and recovery. At the application layer, common computer programs, such as FTP and Teinet, control the session layer and sometimes control the transport layer.
As shown in FIG. 1, routers 13 are used to connect LANs 11 and WANs 12. The main purpose of a router 13 is to allow transparent data communications between computers that reside on separate LANs. At the network level of the OSI model, routers 13 use the IP address in data packets to actually determine the path of the packet from node to node until it reaches the destination node. Along with making this complex decision regarding the packet transmission, they also actively exchange information regarding the overall network topology and adjust those decisions in response to network traffic and even outages within the LAN. Routers also make limited decisions regarding the physical location of the packet""s destination node.
Routers basically have three main functions: learning routes, selecting routes, and maintaining routes. A router learns the routes by creating a routing table by searching for the network address of each network device on a network. The router then selects the routes for the data packets sent through the router by searching for the shortest path between a destination node and a source node. The router also maintains a record of the best routes by listening for IP address changes requests from network devices and updating their routing tables as necessary. The time it takes for all routing tables to update their routing tables is called convergence. In most large networks, the convergence time takes more than several minutes and the updates are sometime in a random order. Routers use Interior Gateway Protocols (IGPs) to update the routing tables. As known in the art, these include protocols such as: Routing Information Protocol (RIP or a newer version, RIPv2), Open Shortest Path First (OSPF), or Extended Interior Gateway Routing Protocol (EIGRP).
Each network node contains a MAC address and an IP address. MAC addresses are unique hardware addresses that are commonly stored in the ROM of every network device. These addresses are assigned according to the manufacturer in the manner of a serial number. The Media Access Control (MAC), or MAC addresses governs the physical layer of a network. IP addresses, on the other hand, are designed to be changed dynamically. IP addresses are often assigned when a network computing device is booted up on the LAN and an IP address is often allocated to the actual computing device by a dynamic host configuration protocol (DHCP) server or a boot protocol (BOOTP) server. The Network Layer routes data packets between a source and destination device by adding an IP address header to each packet transferred over the network. To preserve a reliable data channel between all nodes of a network, each network computing device must maintain unique MAC and IP addresses for each node. This list of MAC and IP addresses is sometimes referred to as an ARP cache or routing table of a computing device. The Address Resolution Protocol (ARP) is used to identify the MAC address associated with an IP address that resides on the same subnet. This is a commonly used protocol to update the ARP cache or routing table of a computing device. With the use of this network architecture, each computer and/or electronic device that is connected to the network 10 is capable of communicating with any other electronic device or computer connected to the network 10.
From time to time, a physical interface, such as a transceiver, cable hub port, hub, router port, network interface card, in a computer network fails. Typically, the failure of a network interface is only detected when other network computing devices attempt to communicate through the interface and the interface fails to respond. The network computing devices may attempt to retry communications through the interface before disconnecting. However, in most cases, recovery from the network interface failure is only achieved by subsequently establishing a network connection though an alternative interface after a network failure. When this happens, most transport connections, session connections, and application associations are lost. In this type of failure, the programs at the application layer are interrupted and in some cases external alarms may be generated. The amount of time and effort it takes for the network resources to accomplish the recovery can significantly interfere with the ongoing operation of the network devices.
Also, typical mechanisms currently used to test network failures include some form of link test, which normally operate at the media access, or physical layer. There are two main difficulties with this technique. First, failures of higher layers in active components such as ethernet switches or routers are not detected. It may be possible to contact the hub at the ethernet level, but not contact any network computing devices through the IP address because of an invalid routing table. Second, network failure detection is hardware, protocol, and media specific. This requires a new procedure, not only for each type of media, e.g. 10 Mbps or 100 Mbps ethernets, but also for each interface chip used, e.g. Intel 82596 or 68360EN, etc.
It is therefore desirable to have a network architecture and method that allows for transparent recovery from failures in the network. A transparent recovery is one in which network resources generally do not realize that a network interface has failed or is not available, thus maintaining most transport connections, session connections, and application associations.
The invention provides a method and a system for implementing interface redundancy in a computer network so that communication between computing devices connected to the network is always available, despite periods in which a particular network interface has failed or is otherwise removed from network operation. That is, when a network interface is unavailable, the network automatically compensates and routes communications through an alternative interface already established. The invention is accomplished by providing a redundant network architecture with mechanisms for automatically detecting and recovering from failure of a network interface. The present invention also allows the network to continue operation without the need for recovery actions, such as the replacing the failed network interface card.
According to the invention, each network computing device periodically searches out and performs link tests with other devices using one or more of the described techniques. In order to optimize the use of limited bandwidth in network communication, the following algorithm may be implemented. First, a network device periodically sends out a broadcast xe2x80x9cpingxe2x80x9d message until at least one peer device responds. Upon receiving a response, the interface is assumed to be operational. The network computing device then retains the address of the device that responded. The computing device then periodically pings that particular address until it does not respond. If circumstances arise in which there is a failure to respond to a ping, the network computing device goes back to broadcasting a ping message as described above. If no device responds, the network interface is assumed to have failed. This algorithm may be used to test all network interfaces associated with a specific network computing device.