Nonuniform memory access (NUMA) systems typically have one or more cells that are logically grouped into partitions. Each cell may have one or more processors.
Each processor may have its own local memory. The local memory for each processor is able to form static or dynamic connections with the local memory of other processors. Thus, the processors and memories are able to independently perform processing while also being able to communicate with each other. This enables a NUMA system to have a large number of processors that are able to communicate with each other while having a low threshold of congestion on buses connecting the shared memory.
The cells and processors in each partition are connected via a high speed interconnect. The high speed interconnect may be used to transfer data between cells. However, the high speed interconnect must be enabled for a cell before the cell can transfer and receive data over the high speed interconnect.
The NUMA system also may include a manageability system. The manageability system has a slow speed interconnect over which messages and other data can be communicated between the cells and shared memory even when the high speed interconnect is not enabled.
The manageability system typically includes a processor configured to monitor system status, such as the health of the system, to direct data between the cells, and to configure portions of the NUMA system. For example, the manageability system typically controls partitioning of the cells.
A cell manageability subsystem enables each cell to transmit and receive messages to and from other cells via the manageability system. The cell manageability subsystem enables the cells in a partition to protect themselves against messages arriving from cells outside of the partition or from other unauthorized cells.
Each cell has firmware. Firmware is code that executes for a computing system. Some firmware operates at the time of power-up to the time a processor boots. Other firmware includes real-time code that operates various peripherals or otherwise continues processing for the computing system. Firmware may, for example, test the health of a computing system on which it resides and/or load an operating system to a computing system on which it resides. This firmware may reside within one or more non-volatile memory parts, such as flash memory, battery-backed memory, read only memory (ROM), a programmable ROM (PROM), an erasable programmable ROM (EPROM), and/or an electrically erasable programmable ROM (EEPROM).
In some cases, firmware must be loaded to one or more cells to replace errored firmware. Errored firmware includes firmware that is not a desired version, non-existent firmware, firmware that is corrupt, firmware for which one or more cells have mismatched versions, and/or firmware in other errored conditions. A desired version of firmware may be a latest version of firmware.
Typically, firmware enables the high speed interconnect for the cells in a partition so the cells can transmit and receive data over the high speed interconnect. However, typically this occurs late in the firmware processing, such as in normal system firmware processing. Thus, if a cell has errored firmware, the normal system firmware does not operate, and the high speed interconnect is not enabled for the cells. As a result, those cells that have errored firmware will not be able to connect to the high speed interconnect and will not become part of the running partition.
Since the high speed interconnect is not enabled for cells having errored firmware, firmware needed to update these cells to an error free version or other desired version currently is loaded to the cells using the manageability system via the slow speed interconnect. In other instances, an off line diagnostics application is used to update or otherwise load firmware to the cells of the system. However, both processes are cumbersome and take a significant period of time to complete. In some instances, the system may be off line for thirty minutes or more during the firmware loading process. These cumbersome processes result in a waste of time and resources.
Thus, new systems and methods are needed to enable loading firmware to the system in a more timely and resource efficient manner. The systems and methods of the present invention enable loading firmware to a high availability system quickly with a savings in time and system resources.