A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
1. Field of the Invention
This invention is related to the field of computer networks and, more particularly, to maintaining the high availability in a computer network utilizing trunking technology.
2. Description of the Related Art
With the ever expanding use of computer networks throughout society has come an increasing dependence of users on the availability of that network. If a network goes down, or is otherwise unavailable, costs to an enterprise may be significant. Consequently, a number of techniques have arisen which are designed to ensure that a computer network is sufficiently robust that it may detect and respond to problems without significantly impacting users. Frequently, efforts to ensure a computer network is consistently online for its users may be referred to as maintaining xe2x80x9chigh availabilityxe2x80x9d. A computer network which has in place mechanisms which prevent hardware or software problems from impacting its users may be referred to as a High Availability Network (HAnet). Some of the characteristics which may be considered when defining a HAnet include protection of data (Reliability), continuous access to data (Availability), and techniques for correcting problems which minimally impact users (Serviceability). Collectively these characteristics are frequently referred to as RAS.
Because of the ever increasing demands placed on networks today, ways of increasing network bandwidth are also of frequent concern. While Fast Ethernet and Gigabit Ethernet may serve to improve performance, the use of such technologies necessitate the need for an even greater increase in backbone capacity. One well known technique used to increase bandwidth is called xe2x80x9ctrunkingxe2x80x9d. Trunking is a technology which may provide dramatic increases in network performance. Using trunking technology, multiple ports may be combined into a single logical port creating a single, high speed, logical link. While trunking provides for increased bandwidth and redundancy, there still exists single points of failure. Because all four ports of trunked connection must be connected to the same switch, single points of failure are introduced. If either the switch or the multi-port system board fail, the network connection will be lost.
In some cases, mechanisms may be put in place which detect an error in a network connection and notify the system administrator that a problem exists. The system administrator may then take corrective action, such as switching to a redundant resource. However, such mechanisms typically take some period of time and necessarily involve interruptions in network operation. In other cases, operating system specific mechanisms may be implemented which may facilitate a failover to a redundant connection. Typically these mechanisms operate at layers below the application layer of the protocol stack. Two widely recognized protocols include TCP/IP and ISO/OSI, each of which include a highest layer referred to as the application layer. Other communication protocols with a layer corresponding to the application layer may utilize a different name. Generally, those layers below the application layer involve software and mechanisms which are not portable across different operating systems. Consequently, these solutions are not portable and generally require a newly created mechanism for each platform on which a failover is desired.
The problems outlined above are in large part solved by a method and mechanism as described herein. A method and mechanism of failover is described. The method and mechanism include the addition of a redundant, secondary, trunked connection which may be utilized in the event of a failure of a primary trunked connection. By utilizing an Application layer mechanism which monitors the primary trunked network connection, automatically detects a degradation in performance in the primary connection, and switches to the secondary trunked connection in a short period of time, network availability may be maintained. Advantageously, network interruptions may be minimized and servicing of network problems may be automated by a mechanism which is portable across multiple platforms. Further, because the mechanism operates within the application layer of the communication protocol, no modification of existing operating software is necessary.
Broadly speaking, a method for maintaining high availability in a two node computer network is contemplated. The method includes adding an Application layer High Availability Networking (HAnet) mechanism to a node of the network, adding a second network connection, monitoring a first network connection, detecting a failure of the first network connection, and performing a failover from the first network connection to the second network connection. The monitoring, failure detection, and failover are all performed by the HAnet mechanism.
Also contemplated is a network node which includes a first network interface, a second network interface, and a High Availability Networking (HAnet) mechanism. The included HAnet mechanism operates at the Application layer and is configured to monitor the first network interface. If a failure of the first network interface is detected, the HAnet mechanism is configured to perform a failover from the first network interface to the second network interface.
Further contemplated is a two node computer network configured to maintain high availability. The network includes a first node coupled to a second node by two connections. The first node includes a High Availability Networking (HAnet) mechanism which operates at the Application layer. The HAnet mechanism is configured to monitor the first connection and perform a failover from the first connection to the second connection in response to detecting a failure of the first connection.