1. Technical Field
The present invention relates in general to the field of computers, and in particular to multiple blade servers housed in a server chassis. Still more particularly, the present invention relates to a method and system for automatically recovering a failed flash of a blade service processor.
2. Description of the Related Art
Server blade computers offer high-density server boards (blades) in a single chassis (blade chassis). A typical server blade computer is illustrated in FIG. 1, identified as server blade chassis 102. Server blade chassis 102 includes multiple hot-swappable server blades 104a-n. There are typically fourteen server blades 104 in server blade chassis 102. The operations of server blades 104 are coordinated by logic identified as a management module 108, which typically includes a processor for controlling input/output (I/O) functions, interfacing with a network 106 (such as the Internet or a Local Area Network), and allocating jobs and data to the different server blades 104.
Another function of management module 108 is to program Flash Read Only Memory (Flash Memory) in server blades 104. This flash operation updates firmware in the server blade 104, resulting in optimized operation. However, since server blades 104 are hot-swappable, there is usually nothing to prevent an engineer from unwittingly removing a server blade 104 from a mid-plane or back-plane (not shown) of server blade chassis 102 while the server blade 104 is in the middle of a flashing operation, which can take several minutes. When the partially flashed server blade 104 is re-installed into server blade chassis 102 or another chassis, it will often malfunction. Upon being re-installed into server blade chassis 102, self-diagnostic logic in the re-installed server blade 104 will recognize that the flash operation failed to fully execute. However, the server blade 104 will often be crippled to the point of not knowing its bus address or physical location within server blade chassis 102, and thus unable to advise management module 108 of the problem with the aborted flash.
Similarly, even if server blade 104 is not removed from server blade chassis 102, but the flashing operation fails, management module 108 will likely not know of the failure. Again, server blade 104 will be unable to notify management module 108 of the problem.
What is needed, therefore, is a method and system for enabling a server blade to communicate with a management module in a server blade chassis after a failed flash operation, which resulted in the server blade not knowing its location in the server blade chassis.