1. Field of the Invention
This invention relates to computing systems and more particularly to memory allocation within a distributed shared memory system.
2. Description of the Relevant Art
Computer systems that include one or more banks of memory may use different architectures to organize and access that memory. Some computer systems may include a single, dedicated bank of memory for each of one or more processors and accessible only by that processor. In these distributed configurations, memory access times may be highly predictable, as the dedicated memory bank may respond according to uniform memory access times. In such configurations, no other processors (or their processes) may be able to access the dedicated bank of memory, so the local processor may have complete control over the memory accesses for its processes. Such configurations may not provide flexibility in terms of the amount of memory available for any one process, if the processor can only access its own local memory.
Other computer systems are configured to include a single memory space that is shared between two or more processors. While this configuration may allow flexibility for each processor to address different amounts of memory for different processes, it may not efficiently scale to large systems. For example, in a computer system including two processors, if both processors need to access the shared memory at the same time, one processor may sit idle while waiting for a turn to access data, negatively impacting system performance. The problem may be compounded when more processors are included in the system.
Some computer systems are configured to include features of both a shared memory architecture and a dedicated memory architecture, in what is called a Distributed Shared Memory (DSM) system. In DSM systems, a separate (local) memory may be provided for each processor, but each of the processors may also be able to access non-local memory, such as a shared block of main memory. Some DSM systems are page-based systems, in which a linear memory space is distributed between processors based on one or more fixed memory partitions, such as a page size. Other DSM systems are object-based systems, in which processes on multiple machines share an abstract memory space filled with shared objects.
Some DSM systems employ a non-uniform memory access or non-uniform memory architecture (NUMA). Under NUMA, the memory access time for any given access depends on the location of the accessed memory relative to the processor. In such systems, the processor can typically access its own local memory, such as its own cache memory, faster than non-local memory. In these systems, non-local memory may include one or more banks of memory shared between processors and/or memory that is local to another processor.
In a NUMA shared memory multiprocessor computer system, each processor, on behalf of some process, may from time to time need to allocate some memory. If sufficient local memory is available, the processor may allocate local memory to the process. If not, the processor may need to allocate non-local memory. In general, if the processor is able to allocate nearby memory, according to the system configuration, the latency of accesses to that memory may be reduced and the performance of the system may be increased. In conventional systems a centralized scheme, in which a single processor is responsible for memory allocations for all processors, may be used to allocate nearby non-local memory to a processor, but such a scheme may lack the ability to efficiently scale to large systems.