1. Technical Field
The present invention relates in general to the field of computers, and in particular to multi-node computers. Still more particularly, the present invention relates to a method and system for removing a node, or a sub-node, from the multi-node computer after transferring the contents of the node's system memory to a remote node's back-up dynamic memory.
2. Description of the Related Art
A multi-node computer is made up of a multiple nodes, each having its own processor or set of processors. Typically, the multiple nodes work in a coordinated fashion under the direction of a primary supervisory service processor in one of the nodes. An example of a multi-node computer is shown in FIG. 1 as multi-node computer system 100. Each node 106 includes multiple sub-nodes 102. Each sub-node 102 includes a processor 108, which is typically multiple processors acting in a coordinated manner. Each sub-node 102 has two modules of system memory 104, which are volatile memory chips, typically mounted on a either a single in-line memory module (SIMM) or a dual in-line memory module (DIMM). As shown in FIG. 1, these memory modules are assigned to Port 0 and Port 1, and have sequential memory addresses, shown in the example of sub-node 102a as addresses associated with the first two gigabytes of memory (dynamic memory 104a) and the next sequential two gigabytes of memory (dynamic memory 104b).
The system memory configuration shown in FIG. 1 does not provide for redundancy. Thus, if a node 106, a sub-node 102, or even one module of memory 104 should fail, or if a node 106 or sub-node 102 is suddenly taken off line from multi-node computer system 100, the data in the failed/removed node's memory cannot be recovered.
To address the problem of data loss from a dynamic memory failure in a sub-node, FIG. 2 depicts a prior art solution involving local back-up memory. Each node 208 in multi-node computer system 200 includes sub-nodes 202, each having a processor 210. Each sub-node 202 has a primary dynamic memory 204 and a local back-up memory 206, which stores an exact copy of the system memory stored in primary dynamic memory 204, typically using the same memory addresses. Such a system affords some degree of data protection, since failure of either primary dynamic memory 204 or local back-up memory 206 allows a sub-node 202 to continue to operate using the local memory that did not fail. However, if the entire sub-node 202 should fail or be suddenly pulled off-line from multi-node computer system 200, such as in a “hot-swap,” then the data in the failed/removed sub-node 202 is lost to the multi-node computer system 200.
Thus, there is a need for a method and system that permits a removal of a node or sub-node from a multi-node computer system through the retention of system memory data from the node or sub-node being removed, preferably without reducing the total memory size of the multi-node computer system.