1. Field of the Invention
The present invention relates generally to memory management in computing systems.
2. Description of the Background Art
Conventionally, a multiprocessing system is a computer system that has more than one processor, and that is typically designed for high-end workstations or file server usage. The performance of a multiprocessing system is not necessarily improved by simply increasing the number of processors in the multiprocessing system. This is because the continued addition of processors past a certain saturation point serves merely to increase communication bottlenecks and thereby limit the overall performance of the system. Thus, although conceptually simple, the implementation of a parallel computing system is in fact very complicated, involving tradeoffs among single-processor performance, processor-to-processor communication performance, ease of application programming, and managing costs.
Parallel computing embraces a number of computing techniques that can be generally referred to as “multiprocessing” techniques. There are many variations on the basic theme of multiprocessing. In general, the differences are related to how independently the various processors operate and how the workload among these processors is distributed.
Two common multiprocessing techniques are symmetric multiprocessing systems (“SMP”) and distributed memory systems. One characteristic distinguishing the two lies in the use of memory. In an SMP system, high-speed electronic memory may be accessed, i.e., shared, by all the CPUs in the system. In a distributed memory system, none of the electronic memory is shared among the processors. In other words, each processor has direct access only to its own associated fast electronic memory, and must make requests to access memory associated with any other processor using some kind of electronic interconnection scheme involving the use of a software protocol. There are also some “hybrid” multiprocessing systems that try to take advantage of both SMP and distributed memory systems.
SMPs can be much faster, but at higher cost, and cannot practically be built to contain more than a modest number of CPUs, e.g, a few tens. Distributed memory systems can be cheaper, and scaled arbitrarily, but the program performance can be severely limited by the performance of the interconnect employed, since it (for example, Ethernet) can be several orders of magnitude slower than access to local memory.
In a hybrid system, multiple CPUs are usually grouped, or “clustered,” into cells. These cells may be referred to as SMP nodes. Shared memory is distributed across the SMP nodes, with each SMP node including at least some of the shared memory. The shared memory within a particular node is “local” to the CPUs within that node and “remote” to the CPUs in the other nodes. One drawback with hybrid systems lies in bottlenecks encountered in retrieving data. Because of the hardware involved and the way it operates, data transfer between a CPU and the local memory can be, for example, 10 to 100 times faster than the data transfer rates between the CPU and the remote memory. Consequently, the problem of how to expose the maximum available performance to the applications programmer is an interesting and challenging exercise. This problem is exacerbated by the fact that most parallel programming applications are developed for either pure SMP systems or for pure distributed memory systems.
One specific type of hybrid multiprocessing system utilizes a cache-coherent non-uniform memory architecture (ccNUMA). A typical design for a ccNUMA system uses several SMP systems as computing nodes and connects them with a cache-coherent switch that supports shared memory across all the processors.