1. Technical Field
The present invention relates to an improved distributed data processing system and in particular to an improved method and apparatus for managing data processing systems within a distributed data processing system. Still more particularly, the present invention provides an improved method and apparatus for booting data processing systems within a distributed data processing system.
2. Description of Related Art
A computer includes both a physical machine, namely the hardware, and the instructions which cause the physical machine to operate, namely the software. Software includes both application and operating system programs. If the program is simply to do tasks for a user, such as solving specific problems, it is referred to as application software. If a program controls the hardware of the computer and the execution of the application programs, it is called operating system software. System software further includes the operating system, the program which controls the actual computer or central processing unit (CPU), and device drivers which control the input and output devices (I/O) such as printers and terminals.
A general purpose computer is fairly complicated. Usually a queue of application programs is present waiting to use the CPU. The operating system will need to determine which program will run next, how much of the CPU time it will be allowed to use and what other computer resources the application will be allowed to use. Further, each application program will require a special input or output device and the application program must transfer its data to the operating system which controls the device drivers.
When a computer is booted, a boot program stored in a read only memory (ROM) is used to initiate loading of the operating system onto the computer's memory. The term "boot" refers to the process of starting or resetting a computer. When first turned on (cold boot) or reset (warm boot), the computer executes the software that loads and starts the computer's more complicated operating system and prepares it for use. Thus, the computer can be said to pull itself up by its own bootstraps. The boot program instructs the computer where to find a larger boot program also called a "boot block" data program, which is used to load the operating system onto the computer. The term "boot block" refers to a portion of a disk that contains the operating-system loader and other basic information that enables a computer to start up. In stand-alone computers, the boot block program and the operating system are found on a local hard drive.
A network containing a number of computers may be formed by having these computers, also referred to as "nodes" or "network computers", communicate with each other over one or more communications links, which is an aggregation which is a computer network. Today, many computer work stations are connected to other work stations, file servers, or other resources over a local area network (LAN). Each computer on a network is connected to the network via an adapter card or other similar means, which provides an ability to establish a communications link to the network.
In managing network computers (NCs), it is desirable to maintain uniformity of programs, operating systems, and configurations among the different NCs. In maintaining uniformity, a technique of using remote boot operations may be employed to support NCs in a network environment. In such a case, each network computer (NC) is booted from a remote boot disk or other device located elsewhere on the network, such as on a server or a disk array system connected to the network. Such a boot system also provides for minimizing the amount of time needed to update individual NCs because system administrators do not have to physically reconfigure or change applications at each NC. Additionally, the remote boot processes provide support for completely diskless NCs. Furthermore, the remote boot process enhances software and network security because the remote boot files may be kept in a secure location and copies do not need to be distributed among NCs in the network.
One problem with remote booting is that at boot time, the time needed to boot an individual NC may take a longer period of time than desired because the boot image is transferred over the network and only a limited number of NCs can boot at a time. For example, booting an NC may take fifteen or more minutes depending on, for example, network traffic, image size, and initialization time. This problem is exasperated in which a network is shut down unexpectedly and is started again. When the network is brought up or restarted, boot storms may occur in which all of the NCs will be booting and loading applications from the network. This situation may result in severe bandwidth problems and greatly increases the time needed to boot the NCs because all of the NCs are attempting to download operating systems and applications from the network.
In many office environments, such a delay in booting or storing NCs is unacceptable. For example, in a banking office environment, having a customer wait for fifteen or more minutes before a transaction can occur is unacceptable as a business practice. This situation, however, occurs when an NC used for customer transactions has been shut down unexpectedly and is rebooted.
Therefore, it would be advantageous to have an improved method and apparatus for booting and rebooting a network computer connected to a network.