This invention pertains to computers and other data processing systems and, more particularly, to computers and other data processing systems for use in a network in which an operating system, application, device driver, data or other software is stored on one computer system in the network and downloaded or “served” to other computer systems in the network.
A typical desktop computer system includes a non-volatile semiconductor memory, such as a well known “flash” memory, for storing program code commonly called “POST” or Power On Self Test. This typical desktop computer also includes a nonvolatile magnetic disk storage device, typically a hard disk drive, onto which the operating system, as well as other programs and data are stored. At power-up, the computer's central processing unit or “CPU” executes the POST code, which performs diagnostic checks, initializes the computer's internal devices, and then loads an operating system program called the “boot loader” from the local hard disk to the computer's main memory. After validating the boot loader code in main memory, control is passed to the boot loader, which loads and executes additional operating system programs and data stored on the hard disk.
In this way, the system loads the operating system kernel and any device drivers, management agents, communications stacks, application programs, etc., that are required for the computer system to become fully functional. The collection of operating system programs, device drivers, management agents, applications, etc., is often referred to collectively as an “operating system image” and is typically customized for a specific computer system or class of systems.
In the desktop computer example above, the operating system image was stored locally on the hard disk drive of the computer, an arrangement that can be described as a “local boot” system. In a network of a plurality of computers, computer servers, or other data processing equipment, it is possible to employ the local boot technique described above by storing a copy of the operating system image on each of the computers in the network, such that each computer boots the operating system from its local hard disk or other non-volatile mass storage device within or attached to each computer.
In addition, a well known “network boot” system can also be used in which the operating system image is not stored locally on each computer in the network, but is stored on a remote computer and downloaded to various computers, servers, and other data processing equipment in the network. The computer, server, or other data processing equipment storing an image to be downloaded will be referred to as an “image server”, and a computer, server, or other data processing equipment in the network that is capable of receiving an image from the image server is referred to as a “client computer” or, simply, a “client.”
Network booting is beneficial and desirable under circumstances where tight control is required over the operating system image, where the operating system image used by the client computers may change frequently, and where the availability of a local hard disk or other non-volatile mass storage device on the client computers is limited or non-existent. The use of client computers lacking a hard disk drive or other non-volatile mass storage device is particularly beneficial in reducing the total cost of ownership of a large network of computers, such as may be found in large corporations.
Network booting is not limited to desktop computers and workstations, but is increasingly being used in networks of servers. In addition, network booting is useful in dense-server packaging schemes, such as “server blades.” A server blade is a complete computer server on a single printed circuit board. Typically, a dozen or more individual server blades can be plugged into a server blade chassis, which provides power, control and inter-blade communication capability.
It is common to find several hundred client computers in a network in which network booting is employed. If a large number of clients are started or restarted simultaneously, then the network and image server resources may become overburdened. For example, this may happen at the restoration of power after a power failure, at initial power-up of multiple rack mounted servers or server blades, or upon reception by a plurality of clients of a command from a management console to restart the operating system or obtain a new operating system image.
Because the network and server resources are not able to process all of the requests placed by a large number of clients in a relatively short time interval, some requests for images either fail or time-out, and therefore, must be retried at some time in the future. This results in a situation where a large number of requests flooding the network interferes with the successful handling of other requests and, therefore, causes a significant increase in the amount of network traffic, which may aggravate the situation even more. The term “boot storm” is used, in particular, to describe the situation in which the image server and network resources are overburdened from too many requests from clients for the operating system image. The term is also used expansively to describe the situation wherein, within a narrow window of time, too many clients make requests for any type image (application, device drivers, data or other computer code) which results in these resources becoming overburdened.
FIG. 1(a) is a graph representing network system boot performance during a boot storm in a prior art system, wherein the vertical axis 101 is indicative of the number of clients attempting to simultaneously access the image server, and the horizontal axis 102 indicates the total time required for the image server to download the boot image to all of the clients. Horizontal line 103 represents the maximum capacity, in terms of the number of clients simultaneously requesting an image from the image server, of the network resources to simultaneously download images to requesting clients. Plot 104 represents a boot storm scenario in a prior art system.
Note that, between times t0 and t1, the number of clients requesting an image from the image server is below maximum capacity line 103, from time t1 to t2 the number of clients requesting an image exceeds this maximum capacity and does not drop below the line until after time t2. During the time t1 to t2 when the number of clients requesting service exceeds maximum capacity 103, clients will interfere with each other, messages will be retried, and responses will be lost. Occasionally, a client may decide that things are so bad that it will give up and may later decide to retry the process from the beginning (effectively throwing away whatever programs and data it was able to collect in the previous attempts). Because of the conflicts and interference, the time to complete the entire process of booting all clients, time t4, takes longer than expected.
By comparison, FIG. 1(b) is identical to the graph of FIG. 1(a), except that plot 105 represents the network system boot performance of a system of the present invention in which the total number of clients requesting an image from the image server is identical to the total number of clients requesting service in the prior art system of FIG. 1(a). Note that at all times, plot 105 is below maximum capacity line 103. More importantly, note that all clients are serviced by time t3 while, in the prior art system of FIG. 1(a), the total time to service all clients is time t4. Thus, as will be described in more detail below, one of the many advantages of the present invention is that it can be used to prevent boot storms, thereby reducing the total boot time of a plurality of clients in a network boot environment.