In an “always on” computer system, such as a banking system, one goal is to improve the availability metric. For example, an availability of 99.99% is 52 minutes, 24 seconds per year that the system is down, and 99.999% availability is 5 minutes, 15 seconds of downtime per year. Higher availability translates into better system reliability. There are many different methods available to reduce a system's downtime, and reducing the number of system boot cycles is one such method. Intentional reboots are a leading concern related to downtime. In a mission-critical setting, any extended downtime can lead to large losses in terms of data integrity, customer satisfaction, or a company's revenue.
Any type of upgrade to a computer's operating system (OS) requires some downtime to install. An OS upgrade is also referred to herein as “installing a new software level”. During the course of a single year, several types of upgrades may be issued, including, for example, bug fixes multiple times per year, minor or support releases once or twice a year, and major upgrades once a year. Even in an ideal setting, installing several upgrades during the course of a year will likely lead to more than five minutes of downtime.
A user will generally install a new software level in two situations: when the new software level contains a new feature the user wants, or when the new software level contains a bug fix that the user needs. The user must choose an opportune time to incur a system interruption to install the new software level. In the meantime, a problem may cause an unscheduled system reboot. Presently, this kind of unscheduled reboot returns the OS to the existing software level, and the user must still perform a scheduled reboot to install a new software level (i.e., a bug fix) to address the problem.
The bug fix problem is particularly troublesome when the bug causes an unscheduled reboot. Installing the bug fix causes a guaranteed reboot, but not installing the bug fix represents a potential reboot. Therefore, many users risk not installing a fix and gamble that they will not encounter the problem. If the fatal bug is encountered, the user has incurred a reboot but is still running their system on the OS level containing the bug. The user must reboot again in order to install the bug fix.
Users that regularly install new bug fix levels encounter fewer unscheduled system interruptions, but increase their overall downtime. The unscheduled interrupts frequently relate to problems that already have an available bug fix. Combining all of these cases shows that system downtime could be reduced if a system interruption brought the user up to a new software level containing the features or bug fixes that they desire.