System errors causing operating system failure remain a pervasive problem in the computer industry. Such errors may occur as a result of hardware failure, user error, and for other reasons. These failures, particularly in the case of networked desktop computers or network system servers, may result in extended unavailability of computer resources and significant financial loss due to user down-time.
Currently, there exists no effective process for completely eliminating such system errors causing operating systems to crash. In many cases, the only method of avoiding recurring crashes is to perform a post-mortem diagnosis, before rebooting the operating system, of the actions which brought about the crash. However, because of the physical difficulty in accessing and analyzing the failed computer and because time is often of the essence in making the system available, many users simply reboot the operating system without analyzing the problems which led to the crash.
One example of where such access is necessary occurs with network servers. Such network servers are often critical to an organization's efficiency, and, yet, may be configured without certain hardware, such as a keyboard and computer display, necessary for performing a postmortem analysis. Network operators are therefore often hurried into simply rebooting the network server without performing a proper diagnosis of the problem.
The problem of remotely analyzing and administering a computer also occurs in the case of wide area, or local area, networks where system administrators may be required to remotely maintain several computers. In typical operation, the operating system executing on the remote computers allows the system administrators to access and modify various parameters on the remote computer. However, in the event of an operating system crash, current systems provide no means for the administrator to access or diagnose the remote computer. Moreover, current systems typically do not allow the administrator to access the remote computer prior to loading an operating system on the remote computer. For example, U.S. Pat. No. 5,390,324 to Burckhartt et al. (the "Burckhartt patent") claims a failure recovery system allowing dial-up access to the failed computer once the failed computer has loaded a reduced operating system stored on a secondary partition on the computer's hard disk. The system of the Burckhartt patent boots off the secondary partition containing the secondary operating system when a detection means detects a system time-out indicating a primary operating system failure.
The following background describes the typical structure and startup procedure of an IBM compatible personal computer ("PC"), however, the concepts are generally applicable to a variety of computer systems. Upon system reset, CPU control is passed to a portion of the computer's Basic Input/Output System (BIOS) known as, the Power On System Test, or Power On Self Test, (POST). The terms system reset and system start-up, as used herein, shall be synonymous and shall include any system start-up, reboot, system reset or other operation causing the commencement of the initialization or reinitialization of the initial program load operation of the computer.
The POST is typically stored in read-only-memory (ROM) and is used to initialize the standard system components, such as system timers, system DMA (Direct Memory Access) controllers, system memory controllers, system I/O devices and video hardware. As part of its initialization routine, the POST sets the default values for a table of interrupt vectors. These default values point to standard interrupt handlers in the ROM BIOS but may be modified to access customized interrupt handlers. The POST also performs a reliability test to check that the system hardware, such as the memory and system timers, are functioning correctly. After system initialization and diagnostics, the POST surveys the system for firmware located on non-volatile memory on optional hardware cards (adapters) in the system. This is performed by scanning a specific address space for memory having a given signature. If the signature is found, control is passed to the firmware which then initializes the device on which it is located.
After the hardware initialization is performed, the POST reads a block of data from a predetermined location from the boot device, usually the hard disk or a diskette drive, into memory, and passes control to the program in that data block. This program, known as a bootstrap loader, then loads a larger program into memory. If the larger program is properly loaded into memory the boot program passes control to it. The operating system is then initialized and gains control of the CPU. As described below, on certain disk-less, or media-less, workstations the adapter firmware located on a network interface card re-routes the pointers used to bootstrap the operating system to download the operating system from an attached network.
The BIOS further comprises a set of routines, or interrupt handlers, for interfacing with the computer and its peripheral components. The BIOS interrupt handlers are accessed through the use of hardware or software interrupts. The addresses of these interrupt handlers are stored in an interrupt vector table. As noted above, this vector table may be modified to point to customized interrupt handlers. The BIOS is generally described by P. Norton in The Peter Norton PC PROGRAMME'S BIBLE, Microsoft Press (1993).
While the BIOS interfacing routines were used by the MS-DOSE operating system, modern operating systems, such as Windows-95.TM., available from Microsoft Corporation ("Microsoft"), do not extensively utilize the BIOS interfacing routines. Generally, Windows-95, and other modern operating systems, make use of device drivers specific to a particular type and model of peripheral hardware component when communicating with such peripheral hardware components. Device drivers provide a uniform interface through which more general purpose software may interact with the peripheral components. These device drivers may replace an existing BIOS interrupt handler, or provide additional functionality which is otherwise not provided. The application software is thus freed from having to interact with the specifics of each hardware device.
Many operating systems, including MS-DOS releases since MS-DOS 2.0, and releases of Windows up to Windows 3.11, include the ability to load installable device drivers from disk when the operating system is booted up. A user may load installable device drivers in the MS-DOS operating system by including the command DEVICE=device.sub.-- file in the CONFIG. SYS file. MS-DOS then reads each device driver file and loads the device driver into memory. Windows-95 has the ability to detect the peripheral hardware components using the PCI (Peripheral Component Interconnect) and Plug and Play functions of the BIOS, and to load the appropriate drivers for the installed peripheral hardware components automatically.
While the use of installable device drivers provides a high degree of flexibility in handing peripheral hardware components, such as network interface cards (NIC), heretofore, an operating system executing on the failed computer has been relied on to load the software driver and provide any supporting functions necessary. If an operating system will not boot, or if it is necessary to perform a postmortem diagnosis prior to reloading an operating system, no software driver for the NIC will be loaded, thus an administrator will not be able to use software acting through the NIC to access the system remotely. There exists a need, therefore, for a method and system of utilizing an installable NIC device driver which is available before an operating system is bootstrapped and does not rely on operating system support. Operating system, as the term is used herein, shall mean system-level software that controls the execution of user-level programs and that provides services to such user-level programs such as resource allocation, scheduling, I/O control and data management. Exemplary of such operating systems are MS-DOS.TM., Windows-95.TM., Windows-NT.TM., all available from Microsoft, MacOS.TM., available from Apple Computer, and various versions of Unix.RTM. available from a number of vendors including Sun Microsystems. Modern operating systems, such as Windows-NT, often include a protected mode kernel or base system at the core of the operating system.
A key problem in the remote administration of computer systems is the fact that there are hundreds of different network interface card types available from a number of vendors, each of which may be programmed differently and may utilize a unique device driver. Developing new device drivers for each of these card types would be expensive and lead to unreliability. It is therefore an object of the present invention to utilize the network enhanced BIOS to use standard NIC device drivers developed for existing operating systems, and thus not require customized device driver software for each of the available network interface card types.
This objective may be achieved by utilizing standard interfaces defined by certain operating system vendors. To support a virtually unlimited variety of network card types, operating system vendors have defined standard interfaces that are to be used by network interface card device drivers. This allows the operating system to support any NIC which supplies driver software that adheres to the standard interface. Novell, Inc. ("Novell") has defined one such standard, the Open Datalink Interface (ODI). Drivers written according to the ODI standard can be used by NetWare.TM., available from Novell. Microsoft has defined a second standard, the Network Driver Interface Specification (NDIS). Drivers written according to the NDIS standard can be used by Microsoft operating systems (e.g. Windows NT). Additionally, other standards are available for versions of the UNIX.TM. operating system.