Within a typical single-processor computing environment, memory is dynamically allocated as it is needed by the processor. That is, when the processor needs memory for whatever reason, it can request and receive an allocation of memory in the amount that is desired. Memory allocation within multiple-processor computing environments is, however, more difficult. Memory may be shared by all the processors, and specific portions of the memory may be local to one of the processors but remote to the other processors. Therefore, it is desirable to allocate duplicates of memory, or one memory location for each processor, rather than having all the processors share the same allocation of memory. It is noted that local memory is accessed more quickly than remote memory. The local memory of a given processor is remote memory to the other processors.
One approach to allocating memory within a multiple-processor computing environment is to allocate a large section of local memory for each processor at boot time, and then dynamically pass out the memory as needed. For static allocations, the simplest technique is to group all per-processor variables together, and then allocate a copy of that group for each processor at boot time within memory that is local to the processor. Each processor needs only maintain a single offset to locate its copy of a given variable within its local memory.
For example, if there are five kilobytes of static per-processor data, the system may choose to allocate five kilobytes of memory per processor at boot time using standard dynamic allocation techniques, by allocating memory for each processor that is within the range of memory closest to that processor—in other words, local memory to each processor. Each processor then stores an offset between the static area and the allocated per-processor data, so that the processor can find its copy of the static area within its own local memory with a simple offset from the static area.
Such an addressing scheme, however, can be problematic after boot time, once all of the initially allocated memory has been passed out to the processors. This is because additional allocations of memory may be difficult to accomplish for the same addressing scheme of a general pointer plus an offset specific to each processor to be employed. For instance, available memory may be fragmented or otherwise limited, such that reproducing the exact same layout between allocations for each processor is at best difficult, and more than likely impossible.
This problem can be solved by allocating memory for all the processors from a single contiguous section of memory, so that the same pointer plus per-processor offset can be used. However, the memory allocated to each processor is no longer local to the processor by which it is used. To allocate memory to each processor that is local to each processor, more complicated and complex addressing schemes may have to be used, which can add undesired overhead and difficulty to the memory allocation process. A common solution is to generate a new table of offsets for each processor with each allocation, which is wasteful and requires an extra memory load to de-reference every different per-processor pointer.