1. Technical Field
The present invention relates to fault tolerant operations in a computer network, in which network stations boot off of remote backup servers.
2. Description of Related Art
Communication between computers in a network often involves the loss of information packets due to hardware failure. The recovery and retransmission of these lost packets is of central concern in fault tolerant operations, in which the network must continue to function despite failure in some of its components.
When failure occurs in a component of a fault tolerant network, such as a server, certain functions must be shifted to alternate servers within the network. The speed with which this process occurs is referred to as the failover time interval. This interval depends on several factors, including the number of alternate servers within the network, the number of transport retries used to access a specific server, and the time intervals, known as time-outs, between transport retries.
In current fault tolerant networks, the length of time-outs and failover intervals is fixed. However, fixed time-outs and failover intervals might be counterproductive depending on the circumstances and the demands placed on the network. Different situations will require different failover intervals in order to optimize the performance of the network.
An example of a situation requiring a fast failover interval is a retail environment. In this case, if a failure in a server caused the network station(s) to be rebooted, the checkout clerk and customers would obviously want a quick failover interval to the next available server. However, there are situations in which a short failover interval is not wanted.
An example of a situation requiring a longer failover interval is a peer-booted environment. In peer booting, a network station boots from either a remote server or its own internal flash card. (A flash card is a module that can hold computer memory without external power.) Once this first network station is booted, the other network stations will then boot from its flashcard. In essence, the first network station becomes the server for the other network stations. A quick failover interval would create problems in this situation, because the peer-booted machines must wait until the network station with the flashcard is fully booted and responding to transport protocol requests before they can boot from it. Therefore, a delay in the failover would allow the first network computer to get up and running before it had to handle transport requests from the other computers in the network.
The same computer network might require different failover intervals depending on the circumstances. In the peer booting example, a quick failover might be called for if only one or a few network stations needed to be rebooted. However, if the entire network lost power, then a longer failover is needed to allow the first network station to fully boot, before the others can peer boot from its flashcard.
Present fault tolerant networks do not have the ability to adjust their time-outs and failover intervals according to the circumstances. Therefore, a method for adjusting time-outs and failover intervals according to the requirements of different systems, as well as different circumstances for the same system, is desirable.