1. Technical Field
The present invention relates in general to digital computers, and in particular to multi-node computer systems. Still more particularly, the present invention relates to a method and system for booting up and configuring multi-node computer systems using a scalability management module.
2. Description of the Related Art
Digital computers, and particularly servers, are often multi-node computers, which are logical partitions such as depicted in FIG. 1 and identified as multi-node computer 100. Exemplary multi-node computer 100 has four nodes 102. Each node 102 includes two sets of processors 106, labeled “0” to “7,” that typically are sets of four or more processors functioning together as a single coordinated processing unit. Each processor 106 is connected to other processors 106 in other nodes 108 by hardware scalability cables 114, and to other processors 106 within the same node 108 via a service processor 112.
In FIG. 1, boot node 108 is a node 102 that has assumed the role of the boot node for multi-node computer 100. As such, boot node 108 configures the logical partition of nodes defining multi-node computer 100. That is, using a menu in a setup utility in Basic Input/Output System (BIOS) 110, boot node 108 gathers and stores in non-volatile random access memory (NVRAM) 116 the Internet Protocol (IP) information that is specific for each service processor 112 in each node 102. Boot node 108 then communicates with the IP address of each service processor 112 in multi-node computer 100 to complete the configuration (memory allocation, processor allocation, etc.) of multi-node computer 100.
RXE (Remote expansion Enclosure) 118 is a “dumb” Input/Output (I/O) expansion unit which contains additional Peripheral Component Interconnect (PCI) slots. While a separate RXE 118 may be coupled to each node 102/108, typically each partition (multi-node computer 100) shares one or more (typically two) RYE's 118 for optimum resources utilization.
If configuration of multi-node computer 100 is desired to be handled remotely, then a system administrator communicates with boot node 108 via a logic identified as remote manager 120, which is typically a computer.
The architecture illustrated in FIG. 1 is highly rigid. If a scalability cable 114 should fail, then the serial connection/communication among nodes 102 and boot node 108 is lost. If a node 102 or boot node 108 should fail or be pulled out of multi-node computer 100 for maintenance resource re-allocation, then the scalability cables 114 must physically be disconnected from the failed node and reconnected to a replacement node, and a Setup menu in BIOS 110 re-entered to include the replacement node's IP address in the partitioning menus. The new partition information is then rebroadcast to all of the existing nodes in the multi-node computer 100. Further, each node 102, and especially boot node 108, must maintain a large amount of code to handle the partition configuration of multi-node computer 100. Finally, to remotely configure multi-node computer 100, the remote manager 120 must be directly connected to the boot node 108, which means that either 1) only one particular node can ever be the boot node, or 2) every node must be connected to the remote manager 120.
Thus there is a need for a system for an external scalability management module that will ease user installation and configuration while providing independent nodes that ability to join into a processor partition without the joining node being “aware” of the node/cable topology in the partition.