Microprocessors are increasingly being called on to support multiple cores on a single chip. To keep design efforts and costs down and to adapt to future applications, designers often try to design multiple core microprocessors that can meet the needs of an entire product range, from mobile laptops to high-end servers. This design goal presents a difficult dilemma to processor designers: maintaining the single-thread performance important for microprocessors in laptop and desktop computers while at the same time providing the system throughput important for microprocessors in servers. Traditionally, designers have tried to meet the goal of high single-thread performance using chips with single, large, complex cores. On the other hand, designers have tried to meet the goal of high system throughput by providing multiple, comparatively smaller, simpler cores on a single chip. Because, however, designers are faced with limitations on chip size and power consumption, providing both high single-thread performance and high system throughput on the same chip at the same time presents significant challenges. More specifically, a single chip will not accommodate many large cores, and small cores traditionally do not provide high single-thread performance.
One factor which strongly affects throughput is the need to execute instructions dependent on long-latency operations, such as the servicing of cache misses. Instructions in a processor may await execution in a logic structure known as a “scheduler.” In the scheduler, instructions with destination registers allocated wait for their source operands to become available, whereupon the instructions can leave the scheduler, execute and retire.
Like any structure in a processor, the scheduler is subject to area constraints and accordingly has a finite number of entries. Instructions dependent on the servicing of a cache miss may have to wait hundreds of cycles until the miss is serviced. While they wait, their scheduler entries are kept allocated and thus unavailable to other instructions. This situation creates pressure on the scheduler and can result in performance loss.
Similarly, pressure is created on the register file because the instructions waiting in the scheduler keep their destination registers allocated and therefore unavailable to other instructions. This situation can also be detrimental to performance, particularly in view of the fact that the register file may need to sustain thousands of instructions and is typically a power-hungry, cycle-critical, continuously clocked structure.