As computing needs have increased, the need for more efficient use of space in data centers has driven the development of denser computing solutions. In addition, the need for easier setup, management, and maintenance has influenced new server designs. Both of these datacenter needs have driven the development of a new class of server systems known as blades. Blades have been made feasible by technology advances allowing complex computer systems performing server functions to be designed in a very dense package. Several of these blade servers may then be inserted into an enclosure infrastructure through which the blade servers are able to share common power supply and ground busses, as well as communications busses over a backplane. This enclosure is housed within a standard server rack. A blade infrastructure allows for increased server density and simplifies the cabling and management of the blade servers as compared to standard rack-mount servers. Many applications that are computationally intensive or that manage large databases or mass storage arrays are addressed using a significant number of servers in parallel. Blades are useful for such applications. One of the challenges of managing a large number of servers, whether blade-based as described above or rack-based servers coupled together over a network such as a local area network (LAN), is maintaining software/firmware code revisions within those servers as each individual system is hardware and/or firmware updated.
Those tasked with the management of large groups of servers employ a wide range of different policies for maintaining system code versions/revisions for each server. One such firmware component that must be managed is the system firmware. Some users may wish to always upgrade each blade to the very latest revision of the system firmware. Others may seek to standardize on a particular revision of the system firmware that they believe is the most reliable for their particular application environment, notwithstanding that one or more new versions/revisions may have been released for the blade server type(s) that they are employing. No matter what software upgrade policy a manager prefers to implement, there are always going to be cases where the system firmware or other software code residing with a server must be upgraded. For a particular server type, a newer firmware version/revision may be required to support a new stepping (i.e. a new mask revision) of its processor or chipset. These hardware updates are almost always accompanied by a required update in the firmware code, which is typically backwards compatible with earlier versions of the processor or chipset. In addition, there may be firmware code updates that are required because they address some critical systems operation issue, and should be made to all of the blades of that type in which previous versions/revisions of the firmware lacking those critical updates currently reside.
Typical regimes employed today for managing system firmware and other software versioning for blade servers follows the same methodology commonly employed for stand-alone rack and tower servers. Each blade contains a revision of the firmware/software code. When upgrading the firmware code for a blade, the same techniques are used as those commonly used for flashing stand-alone servers. Each server system has its code image updated independently by flashing the flash ROM component with the new version/revision to overwrite the old version/revision. Standard mechanisms and policies for maintaining and updating firmware revisions are applied, albeit manually, in this process. In this case, the individual blades of a blade infrastructure provide no specific features for maintaining firmware/software code revisions beyond that which are commonly provided for stand-alone servers.
Mechanisms have been developed by server system vendors in an effort to simplify the upgrade of system ROM firmware or other software revisions for users. For example, Online Flash and mechanisms for mass deployment through Systems Insight Manager are provided by Hewlett Packard Company and can be used to simplify the process of upgrading system firmware. Online Flash mechanisms allow the system firmware to be updated while running the operating system, but require the system be rebooted after the flash process. Mass deployment mechanisms can be employed to update a large number of servers. This allows setting up scripts to allow many servers to be upgraded without manually touching each server. Notwithstanding, these mechanisms still require the user to perform a substantially manual process of identifying which servers should be upgraded and configuring the individual servers, including blade servers sharing a common infrastructure, to upgrade their individual versions of the system ROM firmware or other software.
A more recently proposed mechanism for managing system ROM firmware revisions on the individual blades of a blade infrastructure is to store the only copy of the system ROM on the enclosure of the infrastructure. Each individual blade must still contain a ROM part (this could be a one-time programmable or flash part). Typically, this ROM part would still contain code that performs basic initialization of the system, including configuring the memory subsystem to allow the system to copy the System ROM from the enclosure into shadowed memory of each of the blades. A similar proposed implementation requires that the blade actually fetch code directly from the enclosure's flash ROM. The advantage of this technique is that it simplifies the management of system software versions to only one centralized copy of the software for each particular blade type supported by the enclosure.
However, this simplification is not without its disadvantages. By restricting system software such as system ROM firmware to a single centralized copy for a given blade type in the infrastructure, the versioning process is not nearly as flexible as the manual techniques that are largely still deployed in the field. Because all blades within the infrastructure are constrained to only use the single copy of firmware code residing with the enclosure for their type, the updating of firmware on a blade-by-blade basis would not be permitted. Thus, the maintenance of system software updates using this proposed technique is significantly more restrictive than the update rules already commonly used. For example, a user may wish to observe a firmware update running on a limited number of blades before requiring that all blades of that type be updated. Moreover, it would not be possible to employ a blade that has a newer version of hardware (e.g. a new stepping of the processor for that blade type) if a newer version of the system firmware is required to support that newer hardware, at least until the enclosure's version is updated. Finally, implementing this technique would require a significant departure from the architecture (including both hardware and software) of current systems in the field. Thus, systems would have to be redesigned architecturally, including the development of new ASICs to support such a feature, and users would be required to learn new flash upgrade methods that will be more restrictive than they currently have.