Information Technology (IT) managers and Chief Information Officers (CIOs) are under tremendous pressure to reduce capital and operating expenses without decreasing capacity. The pressure is driving IT management to provide computing resources that more efficiently utilize all infrastructure resources. To meet this objective, aspects of the following questions are often addressed: How to better manage server utilization; how to cope with smaller IT staff levels; how to better utilize floor space; and how to handle power issues.
Typically, a company's IT infrastructure is centered around computer servers that are linked together via various types of networks, such as private local area networks (LANs) and private and public wide area networks (WANs). The servers are used to deploy various applications and to manage data storage and transactional processes. Generally, these servers will include stand-alone servers and/or higher density rack-mounted servers, such as 2U and 1U servers.
Recently, a new server configuration has been introduced that provides unprecedented server density and economic scalability. This server configuration is known as a “blade server.” A blade server employs a plurality of closely-spaced “server blades” (blades) disposed in a common chassis to deliver high-density computing functionality. Each blade provides a complete computing platform, including one or more processors, memory, network connection, and disk storage integrated on a single system board. Meanwhile, other components, such as power supplies and fans, are shared among the blades in a given chassis and/or rack. This provides a significant reduction in capital equipment costs when compared to conventional rack-mounted servers.
Generally, blade servers are targeted towards two markets: high density server environments under which individual blades handle independent tasks, such as web hosting; and scaled computer cluster environments. A scalable compute cluster (SCC) is a group of two or more computer systems, also known as compute nodes, configured to work together to perform computational-intensive tasks. By configuring multiple nodes to work together to perform a computational task, the task can be completed much more quickly than if a single system performed the tasks. In theory, the more nodes that are applied to a task, the quicker the task can be completed. In reality, the number of nodes that can effectively be used to complete the task is dependent on the application used.
A typical SCC is built using Intel®-based servers running the Linux operating system and cluster infrastructure software. These servers are often referred to as commodity off the shelf (COTS) servers. They are connected through a network to form the cluster. An SCC normally needs anywhere from tens to hundreds of servers to be effective at performing computational-intensive tasks. Fulfilling this need to group a large number of servers in one location to form a cluster is a perfect fit for a blade server. The blade server chassis design and architecture provides the ability to place a massive amount of computer horsepower in a single location. Furthermore, the built-in networking and switching capabilities of the blade server architecture enables individual blades to be added or removed, enabling optimal scaling for a given task. With such flexibility, blade server-based SCC's provides a cost-effective alternative to other infrastructures for performing computational tasks, such as supercomputers.
Under current architectures, there is no scheme that enables efficient firmware updates for clustered computer infrastructures, such as blade server environments. As discussed above, each blade in a blade server is enabled to provide full platform functionality, thus being able to operate independent of other blades in the server. Within this context, each server blade employs its own firmware for performing initialization operations and providing operating system (OS) run-time support for accessing various platform hardware and peripheral devices. Accordingly, in order to update firmware for multiple blade servers, it is necessary to perform an update process on each individual blade. When considering that a single rack may hold upwards of 300 server blades, it is readily apparent that updating firmware for blade servers is very time consuming and expensive.
Firmware can be updated in one of two manners. If the firmware is stored on a read-only (i.e., non-writable) device, such as a conventional ROM (read-only memory), the only way to update the firmware is to replace the firmware storage device. This technique is highly disfavored for most end users, as well as vendors, since it requires someone to replace one or more ROM chips on a motherboard or option ROM chips on an add-in card. System firmware may also be updated during operating system “runtime” operations, that is, while the computer is running subsequent to an OS boot. Traditionally, runtime firmware updating has required direct hardware access and special OS-specific device/component/platform support. This typically requires that the OEM (original equipment manufacturer) or IHV (independent hardware vendor) write an update driver for every operating system target for use by the corresponding system device, component, or platform. Furthermore, the update driver usually must be included as part of the set of drivers that may be used under a given operation system, creating a headache for both the OEM/IHV and the OS vendor.