Due to the limitations on clock frequency scaling, the performance gains in future computer systems will come from power-efficient exploitation of concurrency. Consequently, the computer industry has migrated towards including multiple processors on a single chip, which are known as chip multiprocessors (“CMPs”).
In a CMP, multiple copies of identical stand-alone central processing units (CPUs) are placed on a single chip, and a fast, fine-grained communication mechanism may be used to combine CPUs to match the intrinsic parallelism of the application. That is, in CMPs built using the copy-exact approach, all CPUs on a CMP are identical, having exact copies of arithmetic logic units (ALUs), caches and pipelines. This approach minimizes the design complexity of CMPs, since only one CPU needs to be designed, but is instantiated multiple times.
However, the granularity (i.e., the issue width) and the number of processors on the chip in a CMP are fixed at design time based on the designers' best analyses about the desired workload mix and operating points. The issue width may refer to the maximum number of instructions that can be issued in a given cycle for a given processor. By having such limitations, the CMP cannot efficiently handle changes in operating conditions, such as changes in the number and type of available threads or changes in the streams of instructions which occur over time. For example, not all of the processors in the CMP will be effectively utilized if there are not enough software threads at a given time or if there are not enough complex computations to be made in those software threads at a given time. As a result, such a design is power inefficient.
If, however, the appropriate amount of processing power could be dynamically allocated to handle changes in operating conditions, then performance and power efficiency could greatly be improved.