1. Technical Field
The disclosure and claims herein generally relate to multi-node computer systems, and more specifically relate to managing persistent memory in a multi-node computer system such as in the memory of a massively parallel super computer.
2. Background Art
Supercomputers and other multi-node computer systems continue to be developed to tackle sophisticated computing jobs. One type of multi-node computer system is a massively parallel computer system. A family of such massively parallel computers is being developed by International Business Machines Corporation (IBM) under the name Blue Gene. The Blue Gene/L system is a high density, scalable system in which the current maximum number of compute nodes is 65,536. The Blue Gene/L node consists of a single ASIC (application specific integrated circuit) with 2 CPUs and memory. The full computer is housed in 64 racks or cabinets with 32 node boards in each rack.
Computer systems such as Blue Gene have a large number of nodes, each with its own processor and local memory. The local memory is allocated by a translation look-aside buffer (TLB) that provides virtual to physical address translation. The TLB contains a number of pointers to memory segments, where the segments may be 1 m (megabyte), 16 m, 256 m etc. In the typical prior art system, when an application is installed, the TLB and all the local memory are cleared for the new application.
A multi-node computer system is often called upon to perform complex computing tasks on data stored in the node's local memory. Data for an application is typically loaded from a bulk storage unit such as a hard disk drive. The data created by a first application may be needed by a second application, or the same file system data may be used by subsequent applications. When the first application is complete, the data is typically saved to a data storage device, and then the data is re-loaded after the second application is loaded on the node. In a parallel computer system re-loading data for all the nodes for each new application requires a significant portion of computer system resources.
Without a way to effectively create and manage persistent memory, parallel computer systems will continue to suffer from reduced efficiency of computer system.