1. Field of the Invention
The present invention relates in general to microprocessors and, more particularly, to a system, method, and microprocessor architecture providing pipeline resource tracking to avoid global pipeline stalls in a high-frequency processor.
2. Relevant Background
Early computer processors (also called microprocessors) included a central processing unit or instruction execution unit that executed only one instruction at a time. As used herein the term processor includes complete instruction set computers (CISC), reduced instruction set computers (RISC) and hybrids. In response to the need for improved performance several techniques have been used to extend the capabilities of these early processors including pipelining, superpipelining, superscaling, speculative instruction execution, and out of-order instruction execution.
Pipelined architectures break the execution of instructions into a number of stages where each stage corresponds to one step in the execution of the instruction. Pipelined designs increase the rate at which instructions can be executed by allowing a new instruction to begin execution before a previous instruction is finished executing. Pipelined architectures have been extended to "superpipelined" or "extended pipeline" architectures where each execution pipeline is broken down into even smaller stages (i.e., microinstruction granularity is increased). Superpipelining increases the number of instructions that can be executed in the pipeline at any given time.
"Superscalar" processors generally refer to a class of microprocessor architectures that include multiple pipelines that process instructions in parallel. Superscalar processors typically execute more than one instruction per clock cycle, on average. Superscalar processors allow parallel instruction execution in two or more instruction execution pipelines. The number of instructions that may be processed is increased due to parallel execution. Each of the execution pipelines may have differing number of stages. Some of the pipelines may be optimized for specialized functions such as integer operations or floating point operations, and in some cases execution pipelines are optimized for processing graphic, ultimedia, or complex math instructions.
The goal of superscalar and superpipeline processors is to execute multiple instructions per cycle (IPC). Instruction-level parallelism (ILP) available in programs written to operate on the processor can be exploited to realize this goal, however, this potential parallelism requires that instructions be dispatched for execution at a sufficient rate. Conditional branching instructions create a problem for instruction fetching because the instruction fetch unit (IFU) cannot know with certainty which instructions to fetch until the conditional branch instruction is resolved. Also, when a branch is detected, the target address of the instructions following the branch must be predicted to supply those instructions for execution.
Recent processor architectures use a branch prediction unit to predict the outcome of branch instructions allowing the fetch unit to fetch subsequent instructions according to the predicted outcome. These instructions are "speculatively executed" to allow the processor to make forward progress during the time the branch instruction is resolved.
In superscalar processors multiple pipelines can simultaneously process instructions only when there are no data dependencies or resource (e.g., register) conflicts between the instructions in each pipeline. Data dependencies cause one or more pipelines to "stall" waiting for the dependent data to become available. This is further complicated in superpipelined processors where, because many instructions are simultaneously in each pipeline, the potential quantity of resource conflicts is large. Greater parallelism and higher performance are achieved by "out-of-order" processors that include multiple pipelines in which instructions are processed in parallel in any efficient order that takes advantage of opportunities for parallel processing that may be provided by the instruction code. However, out-of-order processors require even more resources to perform, for example, instruction scheduling, instruction renaming, execution, register renaming, and retirement functions. These additional resources can each create stalls when the resource is expended.
Prior architectures rely on each functional unit to signal upstream functional units when resources are not available. The upstream units take an appropriate response by, for example, stalling. Although this feedback system effectively throttles the processor when resources are expended, the inter-unit signaling requires additional circuitry and resources. Moreover, each functional unit must include mechanisms for signal generation, signal transmission, signal reception and handling to effectively use the resource tracking information. These mechanisms make the processor design more complex and limit clock frequencies in high-speed processor designs. What is needed is a system, method, and processor architecture that avoids stall conditions while being compatible with high speed design.