1. Field of Art
This invention relates to a primary hardware controller configured as a routing module associated with a firmware update and more particularly relates to forwarding a firmware update to a redundant hardware controller in order to avoid interrupting the operations of the primary hardware controller.
2. Background Technology
In a server environment that uses redundant hardware controllers sharing a single point to point connection, one controller may be referred to as the primary hardware controller and the other may be referred to as the secondary hardware controller. The primary hardware controller and the secondary hardware controller may connect to a service processor via a shared serial connection.
The primary hardware controller performs all operations associated with the redundant hardware controllers while the secondary hardware controller remains idle. The dedicated point to point connection is configured to duplicate and propagate status information and state changes from the primary hardware controller to the secondary hardware controller. In other words, in a normal operating mode, all writes to the memory of the primary hardware controller are transmitted via the dedicated point to point connection to the memory of the secondary hardware controller, thereby keeping the secondary hardware controller's memory and registers up to date, so that the secondary hardware controller can immediately assume control when required.
The shared serial connection is a connection shared by the primary hardware controller and the secondary hardware controller. However, the shared serial connection to the service processor can generally be controlled by one hardware controller at any given time. The hardware controller that actively controls the shared serial connection to the service processor is the primary hardware controller.
In a blade server environment, such as the IBM® BladeCenter blade servers, the primary hardware controller and secondary hardware controller may be substantially similar to a BMC (Baseboard Management Controller), a specialized microcontroller in communication with the motherboard of many computers, especially servers. And the service processor may be substantially similar to a MM (Management Module).
Slim, hot-swappable blade servers fit in a single chassis like books in a bookshelf, and each is an independent server, with its own processors, memory, storage, network controllers, operating system and applications. The blade server slides into a bay in a chassis and plugs into a mid- or backplane, sharing power, fans, floppy drives, switches, and ports with other blade servers. A blade server is designed to generate less heat and thus save energy costs. With switches and power units shared, space is freed up and blade servers enable higher computing resource density.
A blade server is sometimes referred to as a high-density server and is typically used in a clustering of servers that are dedicated to a single task, such as file sharing, web page serving and caching, SSL encrypting of web communication, transcoding of web page content for smaller displays and streaming audio and video content, etc. A blade server may have an operating system and the primary application program installed on the board.
The BMC manages the interface between the system management software of the MM and the server platform hardware via the shared serial connection. Different types of sensors built into a computer system report to the BMC on parameters such as temperature, cooling fan speeds, power mode, operating system (OS) status, etc. The BMC monitors the sensors and can send alerts to a system administrator via a communications network if any of the parameters do not remain within preset limits, indicating a potential failure of the system. The system administrator can also communicate with the BMC remotely to take such corrective actions as resetting or power cycling the system to get a hung OS running again. These abilities save on the total cost of ownership of a system.
One of the functions of the service processor or MM is to initiate flash updates on the primary hardware controller. To initiate the flash update, the service processor sends a flash start command to the primary hardware controller over a connection, essentially the shared serial connection. Conventionally, all flash updates are downloaded over the shared serial connection, or update connection. In response to receiving the flash start command from the service processor, the primary hardware controller stops executing routine hardware controller code and starts executing boot block code, which is a section of code that responds to and processes all flash commands sent from the service processor associated with a flash update. The primary hardware controller may complete or pause all current tasks while all future tasks may be stacked up in a queue.
During the firmware overlay process the update connection must remain owned exclusively by the target controller in order to transfer the firmware image. The update connection can not be shared or interrupted during the download or overlay. To complicate matters the primary hardware controller is not available to perform other tasks while the firmware image is downloading since the firmware overlay process requires the primary hardware controller to execute the boot block code. Thus, with the primary hardware controller code halted all hardware controller operations come to a halt.
Consequently, during the firmware overlay process, the server device is left without an active hardware controller which restricts the benefits of having redundant controllers. In addition to losing redundancy, server management uptime and hardware controller availability suffer as all tasks are queued until the firmware overlay process is complete.
In the continually evolving information age, one thing remains a constant: the need for 100% availability of mission-critical data and applications. Whether for stock markets, corporate payroll, e-commerce, enterprise databases, medical records, internet banking, or reasons of national security, the availability of these mission-critical resources depends directly on system uptime. Currently, there are no conventional procedures in place to suitably resolve the loss of hardware controller redundancy during the firmware overlay process.
From the foregoing discussion, it should be apparent that a need exists for an apparatus, system, and method that overcome the limitations of conventional flash update methods. In particular, such an apparatus, system, and method would beneficially reduce administrative workloads, thereby lowering total cost of ownership. The apparatus, system, and method would also beneficially minimize administrative intervention and maintain high availability to mission-critical data and applications.