1. Field of the Invention
The invention relates to reducing down-time in computer systems. More particularly, the invention relates to a method and system for adding and/or replacing a device in a file server computer, without having to power down the server computer, and automatically configuring the server computer so as to integrate the device into the server system.
2. Description of the Related Technology
In the computer industry, the reduction of computer failures and computer "downtime" is a major focus for companies trying to achieve a competitive edge over their competitors. The reduction of downtime due to system failures and maintenance is critical to providing quality performance and product reliability to the users and buyers of computer systems. Particularly with respect to server computers which are accessed and utilized by many end users, the reduction of server downtime is an extremely desirable performance characteristic. This is especially true for users who depend on the server to obtain data and information in their daily business operations.
As servers become more powerful, they are also becoming more sophisticated and complex. A server is typically a central computer in a computer network which manages common data that may be accessed by other computers, otherwise known as "workstations," in the network. Server downtime, resulting from hardware or software faults or from repair and maintenance, continues to be a significant problem today. By one estimate, the cost of downtime in mission critical environments has risen to an annual total of $4.0 billion for U.S. businesses, with the average downtime event resulting in a $140 thousand loss in the retail industry and a $450 thousand loss in the securities industry. It has been reported that companies lose as much as $250 thousand in employee productivity for every 1% of computer downtime. With emerging Internet, intranet and collaborative applications taking on more essential business roles every day, the cost of network server downtime will continue to spiral upward.
While early detection and correction of system faults is an important element of an overall high reliability architecture, it is only one piece of the puzzle. Studies show that a significant percentage of network server downtime is due to repair and/or maintenance of the server system, in which defective hardware components must be replaced or new components must be added to upgrade the system. Typical prior art methods for replacing and/or adding new devices, such as a PCI adapter card, for example, involve powering down the entire server, locating the port where the device needs to be replaced or added, replacing or adding the device, powering up the server system, configuring the device, and thereafter, configuring the system so as to permanently integrate the device into the system. The process of configuring the system involves loading appropriate drivers, modifying start-up files, and establishing communication protocols, as necessary, such that the new device is recognized and initialized upon subsequent reboots of the system. This process is tedious and time-consuming and can result in hours of server downtime depending on the nature of the changes made to the server system. Additionally, this manual, human-centered approach often results in configuration errors which further adds to server downtime.
One method of replacing and/or adding devices in a computer system, without first powering down the system, is known as "Hot Plug", otherwise known as "Hot Swap and Add". In this method and system, in order to replace an old adapter card with a new one, or add a new adapter card to the system, only a particular slot in which a new or replacement adapter is to be inserted is powered down, while the remaining slots, components, devices, and subsystems of the server remain operational. Therefore, server operation and service to existing customers is not disrupted. One method and system of hot adding devices in a server system is described in further detail in a co-pending and commonly owned U.S. patent application Ser. No. 08/942,309, entitled "Hot Add of Devices Software Architecture," which is listed in Appendix A attached hereto.
In another embodiment of the hot plug method and system, a "canister", connected to a PCI bus of the server system, is powered down, rather than powering down the entire server. Canisters are detachable housing modules which interface with a main system board and a "backplane" board in the server system. These canisters can typically house up to four adapter cards, which may be inserted into bus slots within each canister. Each adapter card is capable of communicating to a system device, such as a CD ROM drive. The structure and functionality of a canister allows a server system to be easily upgraded in capacity or repaired by allowing easy adding and swapping of adapter cards into the bus slots within each canister. The structure and functionality of the canister is described in further detail in a co-pending and commonly owned U.S. patent application, entitled "Fault Tolerant Computer System," which is listed in Appendix A attached hereto.
As used herein, the term "device", "adapter card" or "adapter" are used synonymously and interchangeably and may refer to any electronic element, component, subsystem, or circuit that may be coupled to a bus system of the server and which may be programmed or configured to communicate data to one or more other entities in the server system. Additionally, as used herein, the term "module" refers to any software program, subprogram, subroutine of a program, or software code capable of performing a specified function.
In the hot plug methods and system discussed above, after an adapter has been hot added to the server system, the adapter must be configured in order to communicate with other devices in the server in accordance with a specified communication protocol. The adapter configuration process includes updating a configuration space within the adapter with a configuration manager program and programming the adapter's registers with appropriate memory, I/O and interrupt parameters, after power to the adapter has been turned on. A method for performing this adapter configuration process is described in further detail in a co-pending and commonly owned U.S. patent application Ser. No. 08/941,268, entitled, "Configuration Management Method For Hot Adding and Hot Replacing Devices," which is listed in Appendix A attached hereto.
However, after the configuration of the adapter is completed, the process does not end there. A system operator must still reconfigure the server system in order to permanently incorporate the new adapter into the system such that it is initialized upon subsequent reboots of the system. Prior art methods of reconfiguring the server, after a new device has been added to the system, typically require a system operator to manually select and load an appropriate driver for the newly added device, and thereafter, modify start-up configuration files by retrieving the files and inputting appropriate configuration parameters corresponding to the newly added device. If the new device is a LAN card, for example, an additional step of "binding the network protocol" of the device must be performed. The term "binding the network protocol", and its meaning, is well-known in the art. This prior art configuration process can be time-consuming and tedious, especially for a system operator who is not familiar with the configuration requirements of the server as well as the newly added device. In such circumstances, the system operator often makes mistakes in performing one or more of the configuration steps. These mistakes may lead to serious errors in server operation and result in additional downtime of the server.
A related technology, not to be confused with hot plug systems, is Plug and Play defined by Microsoft and personal computer (PC) product vendors. Plug and Play is an architecture that facilitates the integration of PC hardware devices to systems that are not powered. Plug and Play devices are able to identify themselves to the computer system after the user installs the device on the bus. Plug and Play devices are also able to identify the hardware resources that they need for operation. Once this information is supplied to the operating system, the operating system can load the device drivers for the device that the user had added while the system was in a non-powered state.
However, plug and play is not designed for hot add applications in server systems. Rather, plug and play devices are typically inserted into a PC when the PC is turned off. Plug and Play does not provide a method of automatically detecting newly added devices while the system is in a constant powered state and thereafter automatically configuring the system so as to permanently integrate the newly hot added device into the system. Plug and Play performs configuration functions only after a device has been inserted into a computer that is powered off and subsequently powered on.
Therefore, a method and system is needed which can not only provide hot pluggability in a server system, but which can also automatically reconfigure the server system so as to integrate hot added devices into the server system, without disrupting system operations and service to existing users. Additionally, there is a need to further reduce server downtime by reducing configuration errors commonly made by prior art manual, human-centered approaches to configuring the server after new devices have been added to the system.