Most modern computing systems, such as desktops and servers, include several features to improve performance and speed of the systems and the applications running on the systems. These features can include multiprocessing, multiple independent caches, and simultaneous multithreading.
Multiprocessing is the use of two or more processors (central processing units, or CPUs) within a single computing system. Multiprocessing can be supported by including multiple physical processors in the computing system, or including one or more multi-core processors in the computing system. A multi-core processor can be a single computing component with two or more independent actual processors (called “cores”). The cores can be the processing units that read and execute instructions. The multiple cores can run multiple instructions at the same time, increasing overall speed for applications that are amenable to multiprocessing (e.g., applications that can be run using parallel execution). Multiprocessing systems can further implement virtual processors, such that each physical processor can appear as multiple virtual processors to an operating system in the computing system. The operating system can execute and schedule processes to the virtual processors as if each virtual processor is a physical processor. In fact, however, the virtual processors share one or more physical processors.
A cache can be a smaller, faster memory which stores copies of data from the most frequently used main memory locations. Caches can be used in order to reduce the average time to access memory. Multiple independent caches in a computing system usually include at least an instruction cache, a data cache, and a translation lookaside buffer (TLB). The instruction cache can be used to speed up executable instruction fetch. The data cache can be used to speed up data fetch and store. The TLB can be used to speed up virtual-to-physical address translation for both executable instructions and data instructions. A computing system can further improve performance by including multiple levels of cache, with small fast caches backed up by larger slower caches. Multi-level caches generally operate by checking the smallest level 1 (L1) cache first. If it hits, the processor proceeds at high speed. If the L1 cache misses, the next larger cache (L2) is checked, etc., before external memory is checked.
Most modern computing systems further support simultaneous multithreading in order to improve performance of the systems. Simultaneous multithreading can improve parallelization of computations (doing multiple tasks at once) performed on a processor by duplicating certain sections of the processor (e.g., the sections that store the architectural state) without duplicating other sections of the processor (e.g., the main execution resources). This usually allows the multithreading processor to appear as multiple “logical” processors (or virtual processors) to the host operating system, allowing the operating system to schedule multiple threads or processes simultaneously by allocating one or more processes to each logical processor. When execution resources would not be used by a process executed by a logical processor, and especially when the logical processor is stalled (e.g., due to a cache miss, branch misprediction, data dependency, etc.), the multithreading processor can use the execution resources to execute another scheduled process.
In a computing system with multiple levels of caches, multiprocessing, and simultaneous multithreading, the multiple virtual logical processors typically share the caches. However, this sharing may result in cache thrashing depending on the applications executed on the multiple logical or virtual processors. Cache thrashing usually occurs when main memory is accessed in a pattern that leads to multiple main memory locations competing for the same cache lines, resulting in excessive cache misses and slower performance (longer latency).
Significant cache thrashing can occur in current multithreading computing systems that run cache-aggressive applications (e.g., older applications that were not written with multithreading in mind, applications that use compression algorithms, etc.).