Over the last several decades, transistors have been getting smaller and faster. For example, today a typical consumer microprocessor can contain over 100 million transistors subsisting on a die no bigger than a hundred square millimeters. At the same time, it can handle clock speeds in the range of 3 GHz.
With this increasing miniaturization, heat and power budgets are becoming more crucial. The peak power consumption of a microprocessor has soared to well over 100 watts in recent times as chipmakers have increased clock speeds, and the thermal density in excess of 100 W/cm2 is approaching practical limits.
Another concern in current microprocessors is the relatively slow connection between the processor and main memory. A typical processor runs several hundred times faster than information can be fetched from memory, so that a processor waits an eternity, relatively speaking, for data to arrive from memory.
One way to resolve this problem is to employ on-chip memory caches and instruction-level parallelism to keep the processors busy on one set of instructions while other instructions are waiting for data to arrive. However, even this instruction-level parallelism is approaching its limits, because an exponential growth in transistors and power is required to achieve a modest improvement in instruction-level parallelism.
One solution to these problems is to exploit parallelism by dividing a processing chip into multiple cores. For example, a hypothetical notebook processor might have eight cores, where a program customized for such a chip could present many threads of execution, each running simultaneously on a different core.
In such a multicore system, each core will have its own local resources, such as register files, branch predictors, and local caches, and it will also share resources with other cores, such as on-die L3 caches, memory channels, and possibly shared functional units. Such shared resources may need to be arbitrated not only in a fair and neutral way, as in the case of balanced parallel software codes, but also in a biased manner, as when some cores are running main user computation threads while other cores are running lower priority or “housekeeping” threads. It would be advantageous to provide a mechanism for software to manage, influence, or bias arbitration of such shared resources among a plurality of cores running threads of differing performance requirements.