Memory is generally allocated for a program during execution from a pool of a memory area called a heap. Garbage collection is a form of memory management for programs. During execution, the garbage collector attempts to identify memory allocated to objects that are not in use by the program, so that they may be deallocated (also referred to as “reclaimed”). An object may be in use by the program, or reachable, if the object can be accessed (also referred to as “reached”) by the program's current state. Since the precise allocation of space in the heap to objects is not known in advance, the memory allocated to objects cannot be accessed via the actual address of the memory. Rather, the memory may be accessed indirectly by the program utilizing references. An object is reachable if it is referenced by a local variable or parameter in a currently invoked function or a global variable, or is referenced by a reachable object. The garbage collector deallocates memory allocated to objects that are no longer reachable. It must not deallocate memory occupied by objects that are still reachable.
For non-uniform memory access (NUMA) computing devices, the cost of memory accesses by processing units (as well as hardware threads, cores, and so on) is not constant. Rather, in such computing devices, the cost of memory access depends on whether the memory is local or remote to a particular processing unit. In such computing devices, a first memory may be classified as local to a particular processing unit and a second classified as remote when the particular processing unit is able to access the first memory faster than the second memory.
An “lgroup” (locality group) is a group of processing units (and/or hardware threads, cores, and so on) and memory in a NUMA computing device for which all memory accesses are local. Memory access from a processing unit in one lgroup to memory of another lgroup would result in a remote, and hence slower, access. Lgroups may correspond to a single processing unit socket and the memory attached to it, multiple processing units and memories attached to a single printed circuit board when the computing device includes multiple printed circuit boards each with one or more processing units and memories, multiple computing devices arranged in a cloud computing configuration, and so on.
Maximization of local memory accesses (i.e., keeping memory accesses within an lgroup as much as is possible) and minimizing remote memory accesses may improve overall system performance and efficiency. Typically, approaches to maximizing local memory accesses and minimizing remote memory accesses in NUMA computing devices deal with optimizing accesses by application threads. Overall system performance and efficiency may be improved by maximizing local memory accesses and minimizing remote memory accesses in NUMA computing devices for garbage collector threads.