The present invention relates to multi-threaded processors and multi-processor systems comprising shared resources.
In current multi-threaded processor cores, hardware resources are implemented in order to execute multiple so-called threads of execution, sometimes shortly referred to as threads, quasi simultaneously. A hardware scheduler decides in each clock cycle which instruction for which thread is to be issued into a main processor pipeline, where the instruction is then executed. This decision is based for example on the availability of instructions per (runnable) thread. Typical scheduling policies used in such a scheduler are Round Robin, where each thread executes an instruction in turn, Weighted Round Robin or other priority-based algorithms, in which for example real-time threads of execution may enjoy a higher priority than non-real-time threads of execution.
In multi-threaded processor cores or multi-threaded processors the so-called thread context, for example a program counter or so-called core register file, is implemented in hardware per thread, i.e. each thread has its own thread context. On the other hand, other portions of such a processor are shared resources, i.e., elements used by two or more threads. Examples for such shared resources are for example a level 1 instruction cache (L1I$), a level 1 data cache (L1D$), the above-mentioned main pipeline, a load store unit (LSU) including Fill Store Buffers, a Multiply Divide Unit (a unit used for executing multiplications or divisions) or an ALU (Arithmetic Logic Unit). In such environments, it may happen that one thread of execution uses such shared resources to an amount that other threads of executions are significantly slowed down. For example, in an embedded system where a non-real-time operating system like LINUX runs on one thread of execution and a real-time operating system, for example a voice processor, runs on another thread of execution, situations may occur where the voice processor is not able to execute the amount of instructions which is required to maintain real-time behavior. It should be noted that this situation may occur even though the above-mentioned scheduler assigns sufficient time slots for execution to the real-time thread of execution, as execution of instructions may be stalled due to occupied shared resources.
A somewhat similar situation may occur in multi-processor systems or systems with multiple processor cores, which also share some resources, for example a DRAM, a level 2 cache or a level 3 cache.