A well-known performance gap exists between microprocessor speed and memory performance. Microprocessor clock speeds double every few years, while memory speeds may hardly improve. A microprocessor might operate at GHz clock speeds, while the random access memory (RAM) used by that microprocessor has clock speeds an order of magnitude slower, at least. Consumers can intuitively appreciate that the performance gap affects mass storage memory, like hard drives and CD-ROM storage, but the gap affects faster memories like RAM and cache, as well. And the performance gap is not only from clock speed, but also from latency problems and memory stalls.
Computer systems use multiple memory levels to address the performance gap. Each level is closer to the registers and can provide the registers with data with decreasing latency. Level 1 memory is a relatively small and very fast memory, typically found on a microprocessor chip, that stores low level instructions and data. Level 2 cache memory is a larger memory that may also be found on the microprocessor. More levels of cache memory are also possible. These memories are typically much smaller than RAM, but much faster.
Unfortunately, Level 1 and Level 2 cache memories are plagued with latency problems and memory stalls waiting for memory from RAM. Larger cache memories, for example, experience greater read and write latencies, greater data translation look aside buffer (DLTB) misses, and greater cache miss information. The DTLB is coupled to a cache and used to assist in locating data in higher levels of memory, such as the cache.
Various techniques have been developed to improve memory performance. Examples include prefetching data, multithreading code, dynamic instruction scheduling, speculative code execution, and cache-conscious data placement. These solutions attempt to address the memory latency problems. Other solutions attempt to address memory allocation problems. Garbage collection algorithms, for example, have been designed to reclaim unused memory regions within a heap and organize existing memory objects in a more efficient manner. More importantly, they relieve the programmer of managing the reclamation of unused memory.
There are a number of garbage collection techniques, e.g., copy garbage collection, mark-sweep garbage collection, generational garbage collection, and sliding compaction. Sliding compaction is a popular garbage collection technique in which live memory objects are rewritten over the dead spaces in the memory heap, retaining the allocation order. The technique is particularly useful for object-oriented applications such as those written in C#, or Java, as well, as for frameworks like the various .Net frameworks (originally developed by Microsoft Corporation of Redmond, Wash.) used in server-based environments.
Garbage collection schemes search the memory heap for areas that are unreachable and therefore reusable. Garbage collectors that fragment memory by limiting where an object can be allocated harm object allocation times and may lead to greater DTLB misses. With sliding compaction, the number of DTLB entries needed to support the working set of the code is reduced because the live objects in the managed heap are brought closer together. A useful characteristic of sliding compaction is that it does not disturb and thus maintains the spatial order in which the objects were originally placed before sliding compaction commenced while also eliminating intervening dead spaces. Thus, spatial locality is actually improved due to in-place compression. Fewer CPU stalls result because of fewer DTLB misses, and code speed is enhanced. Also cache misses may be reduced, because of the reduction in dead spaces.
Yet, despite its performance advantages, sliding compaction is quite expensive in comparison to some other garbage collection routines imposing significant space and time overheads on all phases of garbage collection. These problems are exacerbated with large heap sizes. Even, incremental sliding compaction, i.e., sliding only a portion of the heap during a given garbage collection cycle, is unable to get to the problem areas quickly enough, as many memory regions must wait numerous collection cycles before being managed.
In the end, memory latency and stalls place a high tax on current memory management techniques. The amount of time software code expends on memory management, no matter the technique, is great. Identifying problematic memory regions must be done each time a code is executed and reclamation of the memory spaces within these problematic regions, especially for larger heaps, is too imprecise for efficient code implementation.