1. Technical Field
The present invention relates to computer systems and, in particular to mechanisms for fetching instructions for execution by a processor in a computer system.
2. Background Art
Modem high-performance processors are designed to execute multiple instructions on each clock cycle. To this end, they typically include extensive execution resources to facilitate parallel processing of the instructions. The efficient use of these resources may be limited by the availability of instructions that can be executed in parallel. This availability is referred to as instruction level parallelism (ILP). Dependencies between instructions in a thread of instructions can limit ILP.
One strategy for increasing ILP is to allow a processor to execute instructions from multiple instruction threads simultaneously. By definition, instructions from different threads are independent, which allows instructions from different threads to execute in parallel, increasing ILP. A processor that supports concurrent execution of instructions from two or more instruction threads is referred to as a multi-threaded (MT) processor. An MT processor includes resources, such as data, state and control registers, to track the architectural states associated with the different instruction threads as they execute concurrently. In addition, operations such as instruction fetches and state updates, are modified to accommodate concurrent handling of multiple instruction threads.
The fetch engine of a processor is responsible for providing instructions to the execution resources of a processor. The instruction fetch engine and execution resources are components of the front end and back end, respectively, of the processor's execution pipeline. The front and back ends often communicate through an instruction buffer or queue, which decouples their operations. For example, if the back end of the pipeline stalls, the front end may continue fetching instructions into the instruction queue. If the front end of the pipeline stalls, the backend of the pipeline may continue executing instructions accumulated in the instruction queue.
FIG. 1 is a block diagram of a conventional instruction fetch engine for a uni-threaded processor. For the disclosed fetch engine, a multiplexer (MUX) 110 selects an instruction pointer (IP) from one of several inputs 140(1)-140(m) and provides it to an instruction cache 120. If the IP hits in I-cache 120, it provides instructions from the associated entry to an instruction queue 130. The number of instructions provided on each clock interval depends on the particular processor used. For example, VLIW processors may provide blocks of two or more instructions from their I-caches during each clock interval.
FIG. 2 is a block diagram of a conventional instruction fetch engine 200 that has been modified for a multi-threaded processor. Fetch engine 200 includes IP MUXs 210(a) and 210(b), which provide IPs for their respective threads to an arbiter 250. Arbiter 250 forwards an IP to an I-cache 220, which provides a corresponding block of instructions to one of instruction queues 230(a) and 230(b). For example, arbiter 250 may provide IPs from the different threads on alternating clock intervals. One problem with fetch engine 200 is that the bandwidth to instruction queues 230(a) and 230(b) is, on average, half of the bandwidth to instruction queue 130 of uni-threaded processor 100. When queues 230(a) or 230(b) are empty, this reduction in bandwidth translates directly into a lower instruction throughput for the processor.
An alternative to fetch engine 200 that has a smaller impact on the instruction fetch bandwidth provides multiple ports on, for example, I-cache 220 and its various components (tag array, translation look-aside buffers). Multi-ported structures are considerably larger than single ported structures, so the bandwidth gain may significantly increase the die area of a processor.
The present invention address these and other issues associated with instruction fetching in multi-threaded processors.