This invention relates to the architecture and operation of an improved processor in which instructions for execution are fetched in free-running manner.
Conventional processor designs commonly involve the control of instructions in three stages--fetch, issue, execute. In the first stage, an instruction is fetched from memory at a location identified by a program counter which points to the latest fetched instruction, thereby allowing the next instruction to be fetched. Following the fetch, the instruction is checked for possible data dependencies and, if it passes the test, the instruction and its operands are then issued for execution. (Data dependencies are circumstances where an instruction cannot be executed because data for the instruction is not yet available.) The instructions issued can be identified by a virtual-issue program counter. Once an instruction is issued, it is sent to the execution stage, where it produces a result that is written into either a register file or a memory, thereby altering the state of the processor. Another program counter, the update-virtual PC, identifies the instruction that just completed updating the state of the processor. The three such program counters (fetch, issue-virtual and update-virtual) are traditionally synchronized. Thus, an instruction that is fetched is issued if its operands are available, and an instruction that is issued goes through the execution pipeline. At the end of the pipeline, the state of the processor is updated. The instructions are fetched, issued, and executed, and the processor state is updated, in strict sequential order as defined by the order of instructions in the program.
The three program counters (fetch, issue-virtual and update-virtual) in a traditional processor are linked together so that they point to successive adjacent instructions. Thus, at any time, the fetch, issue-virtual and update-virtual program counters in a conventional processor point to instructions N+2, N+1 and N.
More recent advanced processors include another element called a register scoreboard which checks resources for an instruction to see if the required resources are available for the instruction to execute. If so, the instruction is issued even before the instruction in the execution stage has finished, which can result in out-of-order execution. The register scoreboard records (locks) the resources that would be modified by the instruction at issue time. Any subsequent instructions that want to access those resources cannot be issued until the instruction that initially locked them subsequently unlocks them by updating the resources, and so notifying the processor.
These known processor designs operate with the disadvantage that any stop in the issue of instructions, typically due to resource dependency among instructions, will stop the instruction fetch. This stopping results in loss of performance because fewer instructions are issued for execution. The direct dependency between the issue-virtual and the fetch program counters in a conventional processor thus inhibits achievement of peak performance. This loss of performance is even more pronounced when multiple instructions are fetched simultaneously. In a traditional pipelined processor design, putting N pipelines in parallel so N instructions can be fetched in every cycle does not increase the performance by a factor of N because there are interactions between every element in the matrix of pipelines and instructions, thus increasing the data dependency conflicts.
Some prior art processor designs include branch prediction. In such systems when the processor executes an instruction, in which a branch is reached, a prediction is made as to the likely direction of execution. The processor then executes down that branch of instructions while it awaits validation of the first branch. If a second branch is reached before the first one is validated, the processor stops fetching instructions, degrading performance.