Future computing machines will likely include a greater number of processor cores, which will result in multi-threaded programs becoming more commonplace. However, developers of multi-threaded programs and hardware will need to carefully consider how memory usage is impacted by thread interaction on such machines.
More particularly, the memory performance of a multi-threaded program depends primarily on three factors, namely the shared cache, shared data, and thread interleaving with respect to data access. In general, a shared cache is a dynamic space in which cache blocks are fetched and replaced in response to accesses by different threads. Performance depends on the access location, as well as the access rate and amount of data access. With respect to performance impacts that result from cache usage, threads positively interact when shared data is brought into the cache by one thread and subsequently used by one or more other threads. Threads negatively interfere with one another when non-shared accesses contend for shared cache resources.
Cache interleaving refers to each thread's accessing of the cache during its execution time. For example, threads with uniform interleaving uniformly alternate their cache usage, while threads that carry out asymmetrical tasks produce irregular (non-uniform) interleaving.
The performance of applications running on multicore processors is thus significantly affected by on-chip caches. However, exhaustive testing of various applications on such machines (e.g., 32, 64, 128 and so forth cores) is not always feasible, as machines with fewer cores (e.g., 4-core or 8-core) machines are far more available in test environments than are the larger, expensive multicore machines that need to be used in an enterprise's commercial operations. An accurate cache locality model for multi-threaded applications that quantifies how concurrent threads interact with the memory hierarchy and how their data usage affects the efficiency and scalability of a system is thus very useful in evaluating software and hardware design decisions, and improving scheduling at the application, operating system, virtual machine, and hardware levels.