Over the years, as the interne has expanded and computers have multiplied, the need for clustered computing such as High Performance Computing (HPC) has increased. Clustered computing involves multiple compute nodes, usually a server grid, that work together to achieve a common task. For example, several (typically hundreds of) compute nodes may be clustered together to share the load of serving a high-traffic website. Traditionally, two different approaches have been used for allocating memory among the various compute nodes of a cluster.
The first approach involves physically installing a certain amount of memory in each node, or “brick.” This approach results in several inefficiencies. For example, the memory in the bricks cannot be dynamically reallocated. Instead, if it is desired to change the amount of memory in a brick, an administrator must physically remove the brick from the cluster, open it, and add/remove memory from the brick. Because the memory cannot be dynamically reallocated, each particular brick will likely have to be over-provisioned to ensure optimal operation. Notwithstanding the ability to physically add memory to a particular compute node, the compute nodes nonetheless have a limited number of physical banks for holding memory modules. Thus, in order to meet their needs, some users are forced to pay huge markups for higher capacity memory chips. Moreover, since many cluster applications (e.g., data mining, web search, biometrics, etc.) have large, mostly read-only data sets, and in today's clusters there is a great deal of data duplication among the nodes, it may be desirable to share a read-only data set among the nodes. However, this is not possible when the memory is private to each node.
A second approach to memory allocation involves sharing a pool of memory among the compute nodes. This approach is often used when several processes are working on subdivisions of the same problem and they all see a single area of memory. In this approach, when one processor or a group of processors want to work on a separate task, a region of the memory may be designated for them, though the other processors are still able to see and access that space. It should be apparent therefore that this approach is not without its pitfalls as well. When multiple nodes are accessing the same area of memory, cache coherency becomes an overwhelming issue. Cache coherency arises because each CPU in the cluster has a cache. Since the data in a processor's cache corresponds to data in a memory, the cache therefore needs to be updated based on any changes to that corresponding space of memory. In other words, these types of systems are designed with the assumption that several CPUs are performing various tasks among themselves. In such a system, one node cannot guess whether it is permissible for it to transfer a piece of memory to one space without the other nodes' knowledge of the action. Thus, cache coherency must take place in that the processor making the change must notify the other processors in the cluster of the change. It follows then that valuable resources are wasted performing cache coherency operations.