1. Technical Field
The present invention relates in general to data processing and, in particular, to memory management in a data processing system having a global address space.
2. Description of the Related Art
It is well-known in the computer arts that greater computer system performance can be achieved by harnessing the processing power of multiple individual processing units. Multi-processor (MP) computer systems can be designed with a number of different topologies, of which various ones may be better suited for particular applications depending upon the performance requirements and software environment of each application. One common MP computer architecture is a symmetric multi-processor (SMP) architecture in which multiple processing units, each supported by a multi-level cache hierarchy, share a common pool of resources, such as a system memory and input/output (I/O) subsystem, which are often coupled to a shared system interconnect. Such computer systems are said to be symmetric because all processing units in an SMP computer system ideally have equal access latencies to the shared system memory.
Although SMP computer systems permit the use of relatively simple inter-processor communication and data sharing methodologies, SMP computer systems have limited scalability. In other words, while performance of a typical SMP computer system can generally be expected to improve with scale (i.e., with the addition of more processing units), inherent interconnect, memory, and input/output (I/O) bandwidth limitations prevent significant advantage from being obtained by scaling a SMP beyond a implementation-dependent size at which the utilization of the shared resources is optimized. Thus, many SMP architectures suffer to a certain extent from bandwidth limitations, especially at the system memory, as the system scale increases.
An alternative MP computer system topology known as non-uniform memory access (NUMA) has also been employed to addresses limitations to the scalability and expandability of SMP computer systems. A conventional NUMA computer system includes a switch or other global interconnect to which multiple nodes, which can each be implemented as a small-scale SMP system, are connected. Processing units in the nodes enjoy relatively low latency access latencies for data contained in the local system memory their nodes, but suffer significantly higher access latencies for data contained in the system memories in remote nodes. Thus, access latencies to system memory are non-uniform. Because each node has its own resources, NUMA systems have potentially higher scalability than SMP systems.
Regardless of whether an SMP, NUMA or other MP data processing system architecture is employed, it is typical that each processing unit accesses data residing in memory-mapped storage locations (whether in physical system memory, cache memory or another system resource) by utilizing real addresses to identifying the storage locations of interest. An important characteristic of real addresses is that there is a unique real address for each memory-mapped physical storage location.
Because the one-to-one correspondence between memory-mapped physical storage locations and real addresses necessarily limits the number of storage locations that can be referenced by software, the processing units of most commercial MP data processing systems employ memory virtualization to enlarge the number of addressable locations. In fact, the size of the virtual memory address space can be orders of magnitude greater than the size of the real address space. Thus, in a conventional systems, processing units internally reference memory locations by the virtual (or effective) addresses and then perform virtual-to-real address translations (often via one or more intermediate logical address spaces) to access the physical memory locations identified by the real addresses.
Subject to synchronizing primitives and software-controlled virtual memory attributes, each of the processing units in a typical MP system can generally independently read, modify, and store data corresponding to any memory-mapped storage location within the system. Consequently, in order to ensure correctness, coherency protocols are typically employed to provide all processing units in the MP system a common view of the contents of memory. As is well known in the art, coherency protocols, whether fully distributed or directory-based, employ a predetermined set of cache states in all the cache memories of the MP system, as well as specified messaging between the various controllers of the cache memories and system memories in the MP system in order to maintain coherency. While the implementation of a coherency protocol permits all processing units in an MP system to concurrently process a common data set defined by a range of real addresses, the coherency communication required by the coherency protocol can limit the scalability of the MP system by consuming bandwidth on the system interconnects.