Typical maintenance operations on server-class computers include replacing faulty hardware and adding new hardware (e.g., I/O adapter boards, memory cards, CPUs), upgrading and patching operating systems, upgrading and patching software applications, and performing server reconfiguration. The maintenance may be performed offline. Offline maintenance on a server can involve stopping or rebooting the server's operating system, and shutting off power to the server. These steps lead to an accumulation of “downtime” (that is, time during which the applications are not providing service). While the server is down, hardware or software or both can be serviced. After servicing, the server may be rebooted, and the applications restarted.
It is desirable for businesses to minimize server downtime. Downtime can cause interruptions in critical services, result in a loss of customers (who, inconvenienced by poor service, seek better service elsewhere), and reduce productivity of employees. In addition, the downtime can increase the cost of server operation.
Servers having online maintenance capability can avoid downtime in certain instances. The online maintenance capability can be engineered into computer hardware and operating systems. A server having online maintenance capability can allow maintenance on particular subcomponents without stopping the system as a whole. For example, a server equipped with “PCI Hot Plug” allows a PCI card to be added or replaced while applications that do not depend on that card continue to run. Similarly, operating systems with “dynamic kernel modules” allow software components in dynamic modules to be added, replaced, or removed at runtime. Only those applications that are dependent on that module need restart.
However, not all servers have online maintenance capability. Retrofifting these online maintenance capabilities into existing servers and their operating systems and applications can be prohibitively challenging. Moreover, a redesigned legacy system having online maintenance features does not help those who want to continue using an existing version of that system.