1. Field of the Invention
The present invention relates generally to communications networks and more particularly to a method and system for fast link failover.
2. Description of the Related Art
Companies today depend increasingly on the ability to quickly and reliably access data via communications networks. As the accessibility, reliability, and availability of such communications networks has become more important, a number of techniques to increase these factors have been developed. Redundancy is one such technique frequently used to minimize network downtime and increase the speed at which data may be accessed via a communications network. For example, redundant network links or connections are frequently used to couple a single network element or node, (e.g., a client, server, or other host data processing system, a communications network appliance, or a switch, router, hub, gateway or other redistribution point) to one or more communications networks or portions thereof (e.g., network segments). Such use of redundant network links with respect to network elements residing at the edge or terminating point of a communications network is known as “multi-homing” and such redundantly-linked network elements are said to be “multi-homed”.
FIG. 1 illustrates a data processing system including multi-homed network elements. Data processing system 100 of the illustrated embodiment includes an upstream communications network portion 102, a primary switch 104a, a secondary switch 104b, and a number of multi-homed network elements (e.g., multi-homed endstations 106a, 106b . . . 106n). Upstream communications network portion 102, including any of a number of network elements is coupled to primary switch 104a using a primary link 108a and to secondary switch 104b using a secondary link 108b. Multi-homed endstations 106a, 106b . . . 106n are each similarly coupled (e.g., via a primary network interface card or host bus adapter) to primary switch 104a via one of a plurality of primary links 110a-110n and (e.g., via a secondary network interface card or host bus adapter) to secondary switch 104b via one of a plurality of secondary links 112a-112n as shown.
In operation, data is transmitted between multi-homed endstations 106 and upstream communications network portion 102 using primary links 110a-110n, primary switch 104a, and primary link 108a. Following a failure of any of primary links 110a-110n, (e.g., due to failure of the physical link hardware, a primary network interface card, or primary switch 104a) one or more associated multi-homed endstations may failover to a corresponding secondary link 112a-112n by activating a network interface associated with the secondary link and deactivating a network interface associated with the failed primary link. Data is then transmitted between the multi-homed endstation which has failed over and upstream communications network portion 102 via an associated secondary link 112, a secondary switch 104b, and a secondary link 108b. 
In a conventional communications network however, a failure of a link not directly attached to a network element (e.g., a failure of primary link 108a considered from the perspective of one or more of endstations 106a-106n) cannot be quickly detected. Traditionally, high-level system components (e.g., protocols, applications) have been utilized to detect the occurrence of such failures. For example, a high-level system component resident on an endstation 106 may use a timer to track a time period between data transfers associated with an upstream communications network portion or may use periodic link or connection status messages or “data units” to determine whether or not an upstream link failure has occurred.
The described techniques suffer from a number of significant shortcomings however. To account for ordinary communications network congestion and to avoid falsely declaring a link failure, the threshold time periods (and resultant latency) associated with the described techniques are typically multiple seconds or more. Additionally, such high-level system components frequently operate at an individual application or network element level. Consequently, failure of a link coupling an aggregating network element to an upstream communications network portion may not be simultaneously detected by all downstream network elements to which the aggregating network element is coupled.