1. Field of the Invention
This invention generally relates to fault tolerant computer systems and, more specifically, to a system and method for enhancing fault tolerance and hot swapping in computer systems.
2. Related Art
Computer systems such as file servers and storage servers in computer networks are relied upon by large numbers of users. When a file server or storage server is out of operation, many users are inconvenienced. Thus, technology has been developed which supports maintenance and service of computer systems while they remain operational. One part of maintenance and service includes the replacement of components in the computer systems. “Hot swap” technology allows the replacement of components without turning off the power or resetting the computer system as a whole.
Hot swap enables the insertion and/or removal of components in a computer system while it is still active or operational. In systems that do not support hot swapping of components, each process of component insertion and/or removal requires a complete shutdown of the entire system to prevent damage to other components or to the system. In time critical systems such as communications systems, system downtime is both a financial problem as well as a service quality problem. That is, any downtime means a financial loss and disconnection of service to active lines.
A drawback of hot swapping, however, is it requires trained personnel to insert and/or remove components from a computer system to minimize damages caused by pitting connectors of the components against connectors of the computer system. Another drawback is electrical noise which can adversely affect the performance of the computer system. The noise is caused by the change in current at the instance when connection is made between power pins of a component and corresponding elements of the computer system. The result is voltage transients in the computer system backplane that may cause loss of data, incorrect program execution and damage to delicate hardware components.
Thus, there is a need for a system and method for enhancing fault tolerance and hot swapping in computer systems so as to reduce both the downtime of computer systems and the use of trained personnel to repair and/or maintain computer systems.