Modern multi-core processors have multiple pipelines to run multiple applications and as a result often improve performance for a system simultaneously running multiple tasks. Unfortunately, these multi-core processors also require substantially more power and use more area than a comparable single pipeline processor.
Prior art single pipeline processors may allow multi-thread processing by employing an operating system to manage hardware resource usage and thread switching. However, a significant performance penalty is incurred each time the processor changes threads. Additional inefficiency occurs in a single pipeline processor when a thread is initially allocated a block of execution cycles, but is unable to execute consecutive instructions as scheduled because necessary component data is unavailable.
More recently, techniques for processing multiple threads on a single processor core have been developed that reduce the penalty for thread switching. However, changing the allocation of processing cycles in such systems is performed through a processor issuing instructions to change the cycle count for each thread, which may present various challenges with respect to response time, precision, and predictability.
For example, changing cycle allocation could require up to one instruction per thread. As the master thread may be the only thread with the capability to change the cycle count, it may take many (potentially hundreds) of cycles before the master thread can finish reprogramming the cycles. Since multiple instructions may be required for changing the cycle allocation, and the instructions are not atomic (e.g., other threads may switch in while the master thread is changing the allocation), there may be rounds of imprecise allocation.
Other inefficiencies arise by allocating cycles to a thread even if the thread is not currently executing an instruction. A thread in this situation may loop, wasting processing resources, until the thread is needed. In addition, it may be difficult for the software to know exactly when the cycle allocation needs to occur and so in order to get feedback, polling or other feedback techniques may need to be employed, further wasting processing resources. Moreover, due to challenges with response time and related to the non-atomic nature of the instructions, accurately simulating worst-case behavior may become problematic, thereby sacrificing predictability of the system.