Many high performance parallel computer systems are built as a number of nodes interconnected by a general interconnection network (e.g., crossbar and hypercube), where each node contains a subset of the processors and memory in the system. While the memory in the system is distributed, several of these systems (called NUMA systems for Non-Uniform Memory Architecture) support a shared memory abstraction where all the memory in the system appears as a large memory common to all processors in the system.
These systems have to address the problem of where to place physical pages within the distributed memory system since the local memory is close to each processor. Any memory that is not local to the processor is considered remote memory. Remote memory has a longer access time than local memory, and different remote memories may have different access times. With multiple processors sharing memory pages and a finite size memory local to each processor, some percentage of the physical pages required by each processor will be located within remote physical memory. The chances that a physical page required by a processor is in local memory can be improved by using static page placement of physical memory pages.
Static page placement attempts to locate each physical memory page in the memory that causes the highest percentage of memory accesses to be local. Optimal physical memory page placement reduces the average memory access time and reduces the bandwidth consumed inside of the processor interconnect between processor nodes where there is uniform memory access (UMA) time. The static page placement schemes include Don't Care, Single Node, Line Interleaved, Round Robin, First Touch, Optimal, etc., which are well known to those skilled in the art.
Dynamic page placement may be used after the initial static page placement to replicate or migrate the memory page to correct the initial placement or change the location due to changes in the particular application's access patterns to the memory page. The multi-processor's operating system may be involved in the decision and copying/movement of the physical page.
A replication is the copying of a physical page so that two or more processors have a local copy of the page. As long as the memory accesses are reads, multiple copies of data can be allowed without causing coherence difficulties. As soon as a write to the page is sent to the memory system, either all but one copy of the page must be removed or an update coherence algorithm must be in place to make sure all of the pages have the same data.
A page migration is the movement of a physical memory page to a new location. The migration is usually permanent and does not require special handling as is required for writes to replicated pages.
An approach to dynamic page placement is described in the paper by Ben Verghese, Scott Devine, Anoop Gupta, and Mendel Rosenblum, “Operating System Support for Improving Data Locality on CC-NUMA Compute Servers”, In ASPLOS VII, Cambridge, Mass., 1996.
To track the changes in the application's access patterns to the memory page, histories need to be maintained for every page in memory. A set of counters is located close to the memory system for every physical page in memory and one counter is required for every UMA cell in the multi-processor system. Whenever a memory access is generated from a processor within a UMA cell, the counter representing the page and the UMA cell generating the memory access is incremented.
There are two main locations for the counters, either within the memory itself or located in a separate hardware structure, such as the memory controller or the directory controller. Placing the counters within the memory has the advantage of keeping the cost down by using the existing DRAM in memory and the number of counters are automatically scaled with the installation of more memory. Unfortunately, this placement has the disadvantage of halving the memory bandwidth because of the accessing and updating of the counters. Placing the counters outside of memory adds a significant amount of hardware to the system because the hardware must be designed for the maximum amount of installable memory and also for the minimum physical page size.
The problems noted above have prevented the wide spread use of dynamic page placement and there are almost no systems in existence which use this technique. A solution which would allow the counters to be placed within the memory controller while consuming less space and removing the constraint of containing enough counters for the maximum allowable memory and smallest page size is necessary before dynamic page placement becomes feasible in real-world computer architectures.