The term “boot” or “boot up” is commonly used to describe loading operating system software into a computer system. More specifically, booting may comprise a process including several operations culminating in the loading of an operating system into system RAM. In many well-known systems, the boot process may begin with the loading of a BIOS (Basic Input/Output System) program from a ROM (Read-Only Memory) device. After performing some self-testing operations, the BIOS typically loads and then branches to a program called a “boot loader” that will actually load the operating system software. The boot loader typically resides in a reserved location on the system hard disk, for example, in the starting sectors of the hard disk.
After the BIOS program branches to the boot loader, the boot loader typically loads system initialization files that then proceed to load the operating system. Such initialization files may be known as “kernels” or operating system (OS) “images.” A kernel or OS image may specify a basic configuration of the OS, such which OS files need to be loaded. For example, an OS image could specify which device drivers needed to be loaded.
As is well known, during a typical OS boot process, the process can “hang”; i.e., the process may stop short of a complete, successful boot due to some hardware or software problem, such as a corrupted or missing OS image. Typically, the first approach to addressing the problem of a boot that fails to complete successfully is to re-boot the system, usually from the same OS image.
In many settings, re-booting the system is performed by a human user; i.e., a user manipulates some control means of the computer, such as a keyboard or reset button, to cause the re-boot to be initiated. However, in other settings such human intervention is not readily available. For example, a computer in a remote base station of a telecommunications network may not be easily accessible by a user in the event that a boot of the computer hangs.
It is known to attempt to re-boot the system automatically (i.e., without human intervention). However, in known systems, the re-boot may be continually attempted from the same OS image. If the OS image is corrupted, for example, this can lead to an infinite loop of system resets, making the system unusable.
Additionally, known boot routines do not perform a rigorous test of the condition of the devices of the hardware platform in which an OS is to be loaded, prior to loading the OS. Examples of such devices include memory controllers, Ethernet cards, serial I/O cards and custom ASICs (Application Specific Integrated Circuits). Thus, an OS may appear to boot successfully, while in fact one or more devices of the hardware platform may not be operating or may be operating in a sub-standard condition. This can lead to problems later on, as the OS begins to run application programs that require the inoperative or sub-standard devices. The problems may be worsened, for example, in settings as discussed above, where there is no human operator available to monitor the computer and take corrective action if needed.
A method and system are needed to address the problems outlined in the foregoing discussion.