1. Technical Field
The present disclosure relates to non-uniform memory access data processing systems in general, and in particular to a method for preserving memory affinity in a non-uniform memory access data processing system.
2. Description of Related Art
Generally speaking, the performance of a computer system largely depends on the execution speed of system and application programs. During program execution, both instructions and data need to be fetched from a system memory. While the frequency of memory access has been greatly reduced via the utilization of a cache hierarchy, system memory accesses after cache misses still account for a significant portion of program execution time.
The disparity between program execution time and memory access time continues to increase even with various improvements in computer hardware technology. In fact, while program execution time decreases when processor frequency increases, as expected, the number of processor cycles needed to retrieve data from a system memory effectively increases. For example, when the clock frequency of a processor is doubled, the execution time of an integer instruction is likely to be reduced by half, but the number of processor clocks for accessing a memory may actually be doubled. In addition, memory speed has not been keeping up with the processor clock speed. For example, processor clock speed had increased about 60% to 100% from one processor generation to another while memory speed had increased only 25% within the same time frame.
One way to shorten memory access time is to place a system memory as close to processors as possible physically. But in a large server system, it is difficult to position the system memory in the ideal proximity to processors under the form factor of the server system, which leads to varying latencies to access different regions of the system memory. Thus, large server systems tend to use a distributed memory model known as non-uniform memory access (NUMA). One challenge for a NUMA computer system is to maintain high memory affinity to various processors where threads/processes are being executed. High memory affinity implies that blocks or pages of the system memory that are used local to a processor are positioned in a memory region close to the processor.
Currently, an operating system can start a program with a high memory affinity by allocating newly accessed pages in a local memory affinity domain, i.e., in a local memory or a memory having minimal latency. This strategy, however, cannot cope with changes in memory affinity stemmed from certain operations initiated by the operating system.
For example, for load balancing purposes, processes may have to be migrated from heavily utilized processors to less utilized ones. Also, in order to decrease power consumption, processor folding operations can be utilized to force process migration for freeing and powering down some processors when the system load decreases. Process migration can also occur when system load increases, which may result in processor unfolding to spread out the increased workload to more processors. All these dynamically occurring process migration can cause a loss in memory affinity, which can lead to various degrees of performance degradation due to an increase in remote memory accesses.
One prior art solution for preserving memory affinity is by banning process migration completely. This strategy can certainly reduce the likelihood of losing memory affinity, but at the expense of forgoing the flexibility of the system to perform proper load balancing and/or processor folding. Importantly, even with this drastic measure, a system still may not be able to cope with a shift of memory affinity due to dynamically changing access patterns. This can happen, for example, when a page is shared by processors from multiple affinity domains, and at different computational phases a different processor becomes the dominant source of access to the page.
Another prior art solution is to migrate pages along with a process migration. This solution triggers the problem of not knowing which pages to migrate with the job and sometimes wrong pages may be migrated, which will actually reduce memory affinity system-wide. This problem is particularly bad for pages that are shared among processes migrating to different computing resources.
Consequently, it would be desirable to provide an improved method for preserving memory affinity in a NUMA computer system.