As the microprocessor industry continues to improve the performance of central processing units (CPUs), more emphasis is being placed on designs supporting greater degrees of parallelism in CPUs, as well as multiple CPUs on a single chip. This emphasis is due to, at least in part, an increased need for thread-level parallelism. As is well known in the art, multiple applications may execute in parallel on a multi-tasking operating system. Furthermore, each of these applications may be further divided into multiple threads of execution. Each thread may be also referred to as a “process” or “task.” A highly-parallel system is able to execute potentially many threads concurrently, and thereby improve system performance.
However, threads in such a system may contend for access to memory. Memory in computer systems is typically hierarchical, with small amounts of fast memory located nearby the CPU(s) in a cache, while a larger amount of slower memory is available in main memory (e.g., RAM) and an even larger amount of yet slower memory is available in secondary storage (e.g., a disk drive). A thread may require memory to hold its instructions and data. Instructions are the actual microprocessor codes that a CPU will execute on behalf of a thread. The set of all instructions that comprise an executable program is sometimes referred to as the program's “image.” Data is the memory that a thread uses during execution.
Given that a CPU can typically read or write cache memory in a small number of clock cycles, it is desirable to maintain a copy of a thread's instructions and data resident in the cache. However, when the cache is shared between all of the threads that are executing in the system, any one of these threads is unlikely to have all of its instructions or data cache-resident. This can lead to worst-case situations in which each thread that is task-switched into a CPU has no cache-resident instructions or data, because other thread(s) have used all of the available cache lines. The cache then must reload the appropriate instruction and data from slower main memory, which delays execution of the thread. This phenomenon is known as “thrashing” the cache.
In real-time computer systems, such as avionics command and control systems, critical threads may need to execute a certain number of times within a given time frame. When critical threads contend with one another or with other threads for cache space, the overall efficiency of the system is reduced. For example, the system might have to be configured to assume that a worst-case cache delay occurs each time a critical thread is task-switched into a CPU. It is desirable to allow a real-time system to operate correctly on inexpensive, off-the-shelf hardware. However, cache-thrashing of critical threads may result in the system being able to support fewer threads, or the system requiring faster and more expensive hardware.
As is known in the art, the term “CPU” can refer to a single CPU core of a multi-CPU integrated circuit, or die. For purposes of simplicity, the term “CPU” shall include, but not be limited to, a CPU core that may operate in either a single core or a multi-core system.