System level management of computer memory is referred to as memory management. Memory management provides ways to dynamically allocate portions of memory to programs and to reclaim memory no longer needed by programs. Memory requests are satisfied by allocating portions from a pool of memory called the heap or free store. The heap is normally associated with a program's memory space. Off-heap is an additional allocated memory area.
Virtual machines, like the Java® Virtual Machine (JVM), the .NET Common Language Runtime (CLR) or the Erlang Virtual Machine (BEAM), provide one or more options for automatic resource management and collection, referred to as Garbage Collection.
A common approach for Garbage Collection is storing all created objects based on their current lifetime in different areas, the so called Generational Garbage Collection. When objects mature they are moved from young generation areas, sometimes with intermediate regions, to an old generational space. The basic idea is based on the realization that most objects in today's applications have a very short lifetime.
When dead objects are about to be cleaned up, the Garbage Collector has to walk through all known and reachable (still living) objects and mark them. After the marking phase, those living objects are either evacuated to another space (for young and intermediate spaces) or all non-marked memory positions are wiped clean. Time to walk the objects grows exponentially by the number of reachable objects, it is therefore recommended to keep the number manageable. As a result, state of the art Garbage Collectors are generally limited to managing 4 GB of memory (with pauses of no longer than 100 milliseconds).
Moving objects involves costly memory area copying. Wiping out unused memory areas eventually results in fragmentation with small chunks of memory areas. The small chunks of memory are generally too small to store further objects unless multiple chunks are combined in a compaction operation. Compaction is time consuming and may result in the violation of latency guarantees.
Automatic resource management of this type is utilized when caching data in-memory. Caches keep cached elements with a fairly long lifetime; their count massively outnumbers the typical objects of an application. The Garbage Collector has to walk the objects on every collection cycle, which slows down the application.
Current workarounds are based on acquiring native memory from the operating system into the virtual machine's memory space and managing this region itself, without interaction from the Garbage Collector. This approach is often referred to as Off-Heap, in contrast to Garbage Collection managed On-Heap objects. Off-Heap processing allows for the management of huge memory spaces. As used herein, a huge memory space is from 200 GBs to 10s of TBs. Since Garbage Collection constraints are thought to limit On-Heap objects to no more than 4 GB, Off-Heap processing is deemed to be the only practical approach to managing huge memory spaces.
In the case of Off-Heap processing, cached elements are stored into a custom memory space and element references are removed from the heap. The Garbage Collector therefore does not know about this object anymore and does not visit it when searching for living objects. This requires objects with a known lifecycle behavior to be efficient. A drawback of this approach is that most virtual machines do not offer support for this directly. Rather, code has to call into native code and often convert internal data types into values that the operating system understands.
Therefore, there it would be desirable to avoid Off-Heap processing of huge memory spaces.