Dynamic tools and other systems that operate at runtime often employ software code caches to store frequently executed sequences of translated or instrumented code for use on subsequent executions, thereby avoiding the overhead of re-translation. While caches can improve performance, their size must be carefully managed to avoid occupying too much memory and ultimately degrading performance. They also must be kept consistent with their corresponding original application code. Both tasks are complicated by the presence of multiple threads.
Any code caching system that targets applications with multiple threads faces a choice: increase memory usage by using thread-private caches, or increase the complexity of cache management by sharing the code cache among the multiple threads. Some systems opt to not support multiple threads, in particular some simulators and emulators that model a single processor. Those that support multiple threads but choose thread-private caches enjoy straightforward cache management, synchronization, and scratch space, and work well on applications with little code sharing among threads, such as interactive desktop programs. However, as discussed further below (e.g., Section 1), modern server applications have significant amounts of sharing among threads, and thread-private caches use prohibitive amounts of memory, resulting in poor performance on these programs.
Existing systems that use thread-shared caches typically solve the thorny problem of evicting code from the cache via a brute-force solution: suspend all other threads or otherwise force them out of the cache immediately. This solution requires that cache management be kept to a minimum, which may not be practical for applications that incur many cache invalidations. Suspension also does not scale well on multiprocessor machines where it prevents concurrent execution. Among other disadvantages, these shortcomings limit the applicability of such systems in production environments.