The growing disparity between processor speeds and memory is well known. Many applications are written in languages that execute in environments that perform memory management techniques such as garbage collection. Such languages include but are not limited to languages such as C# and Java. Applications written in these languages tend to have large dynamic working memory page sets with poor data locality. Poor data locality can cause applications to perform poorly and will not scale well as processor speeds increase.
Larger and multi-level caches help hide memory latency to a certain extent. However, cache memory is expensive and on-chip caches (e.g., L1 cache, ITLB cache, and DTLB cache) are not likely to grow at the same rate as workloads of modern applications due to this expense. Further, prefetching techniques in hardware can sometimes reduce memory latencies, but prefetching for irregular data accesses is difficult when serial dependencies (e.g., pointer indirection) preclude timely materialization of prefetch addresses.
Consequently, there has been interest in improving data locality of applications using software techniques. Both static and dynamic techniques have been investigated and reported in recent literature. Static techniques rely on ahead-of-time program analysis, typically using profile data to co-locate objects based on reference locality, or inject prefetch instructions at compile time to hide memory latencies. The main advantage of these approaches is that there is no runtime overhead; however, they may suffer from the usual limitations of static approaches (e.g., difficulty of handling dynamic loaded assemblies and classes, cost of whole program analysis for just-in-time compilers, and difficulty in dealing with changing phases of a program). Some of the garbage collection (GC) based systems employed a copying mechanism to reorganize allocated objects at runtime whether or not they were recently accessed. However, GC has been utilized primarily for reclaiming memory, and better spatial locality has been achieved passively as a side effect of compacting or reorganizing the heap with the primary goal of reclaiming space.
Other GC-based approaches also use instrumentation to collect profile information at run time but the profiling costs of these techniques have been too heavy.