Modern symmetric microprocessor systems (SMP) incorporate multiple processor cores sharing a last-level memory cache (LLC). A cache is a high-speed data storage area adjacent the processor core for storing a copy of recently accessed or frequently accessed data that is stored in the main memory system of a computer. The term “processor core” is used herein to indicate an execution engine that may coexist with other processor cores on a single die. In modern multi-core processors, each core often has one or two levels of its own cache, and shares a second- or third-level cache (the LLC) with one or more other cores on the same die. However, there are also processors with multiple cores on separate dies that share an LLC on the main mother board or within a processor package. Having more than one core allows for more than one thread to execute simultaneously on a single computer system (and not just time-wise interleaved).
When a thread that is executing on one core of a processor fetches data, it first checks local cache to see if the data is already present in the cache. When there are multiple levels of cache, the checks percolate through to the LLC if the earlier caches do not have the requested data. If the requested data is not in the LLC (an LLC “cache miss”) then the data is fetched from main memory, and a line is evicted from the LLC so that the newly-fetched data can be made available in the LLC in case it is needed again. When the LLC is shared by a plurality of processor cores, the data that was evicted was placed there by the same thread whose memory request resulted in the eviction, or by a different thread, possibly running on a different core. As a result, the execution of one thread on one core can adversely affect the execution of other threads running on the same or other cores that share the same LLC.
CPU resources are generally allocated to a plurality of concurrently running threads that may execute interleaved on a single core or simultaneously on a plurality of different cores, or both. There are many existing scheduling algorithms in use, which generally attempt to provide some “fair” distribution of processor resources to each of the executing threads. In some cases, a CPU scheduling algorithm may take into consideration a “proportional share” of the scheduling resources, such that some processes are granted a greater than even share of processor resources. In a proportional fair scheduling policy, for example, a first thread may be given a proportional share of 800, and a second thread given a proportional share of 400, so that the ratio between the two is 2:1, and the first thread is given twice the resources (i.e., CPU execution time) as the second.
Contention for LLC and other microarchitectural resources can adversely impact the fair distribution of processor resources among threads, making the distribution less fair. Microarchitecture refers to a physical implementation of an instruction set architecture in a computer system. Microarchitectural resources include the physical resources such as the cache, memory interconnects, and functional units. Contention for these resources result in delays in useful execution by one thread imposed by another thread. For example, because the execution of a first thread on a first core can interfere with data stored in the LLC that is shared with a second thread, and because a cache miss imparts a significant penalty in terms of the time it takes to fetch the data from the main memory, the presence of a shared LLC can result in delays in execution to the second thread caused by the first thread.
Further complexity is added to scheduling decisions when a computer has multiple processor “sockets” or “dies.” (The term, “die” or “chip” refers to a single, typically silicon-based integrated circuit, whereas “processor” generally refers to a processor package that is typically removably mounted to a “socket” on a computer motherboard, such that there is a one-to-one relationship between processors and sockets, although a single package can contain multiple dies or chips.) Herein, the term “processor” or “central processing unit” (“CPU”) will refer to a die, chip, package, or socket having multiple processing cores that share an LLC. Threads may be transferred from the one processor to the other and between cores of a single processor. The decision of which thread to transfer between processors should be made intelligently, based on expected behavior of the thread and its impact on microarchitectural resources, including the LLC of each processor.