In order to improve the performance of an important class of modern memory-intensive applications, such as web services, application servers, and online transaction processing systems, which are notorious for causing frequent processor stalls and have processor pipeline utilizations of less than 20 percent, new processor architectures are being developed. One of the more recently developed architectures placed in commercial use is a multithreaded (MT) processor. MT processors typically use on-chip processor cache memories to reduce memory latency and are further designed to hide the effects of memory latency in the cache memories by running multiple instruction streams in parallel. In particular, an MT processor has multiple thread contexts, and interleaves the execution of instructions from different threads. As a result, if one thread blocks on a memory access, other threads are still active and can make forward progress. The MT processor architecture has quickly become popular enough that the majority of new processors that are being released are multithreaded.
Understanding the relationship between the performance of the on-chip processor cache memories and the overall performance of the processor is critical for both hardware design and software program optimization. For example, due to the improvement in memory access, software designers may need to place less emphasis on optimizing their applications for high cache hit rates. Alternatively, if a dynamic estimate can be made of multithreaded processor throughput based on processor cache performance, the estimate can be used to schedule processes in a multiprocessor system.
While considerable work has been done with conventional single-threaded processors to determine the relationship between the performance of on-chip processor cache memories and the overall performance of processors, the relationship is not well understood for new multithreaded processors.