1. Field of the Invention
The present invention generally relates to general high-performance operation within computer pipelines. More specifically, a structure and method dynamically shortens a pipeline under predetermined circumstances, thereby providing shorter latency for those circumstances and overall improvement in processor performance.
2. Description of the Related Art
In the earliest electronic computers (e.g., in the era of von Neumann), the processor would do one instruction, from start to finish, at a time. The very first “parallelism” technique that evolved in the next era was that of “pipelining.”
The processing of an instruction requires several steps. In general, these steps are the same steps for many different instructions, and the hardware that implements those steps is built to perform those steps the same way, regardless of the values of the data being operated on. In pipelining, the various steps are implemented “piecemeal,” exactly the way that an assembly-line works.
Each step is performed by a unique piece of logic circuitry, and the sequential steps are implemented by connecting those pieces of logic circuitry (called “pipeline segments”) together in sequence, and “insulating” those pipeline segments from each other by putting staging-latches between them. The computer pipeline is then a Finite-State Machine (FSM): the processor clock captures data in the staging-latches as the “state” of the processor on any cycle.
In a sequence of clock cycles, a given instruction will enter the pipeline, and will be processed piecemeal in each sequential pipeline stage as the clock ticks. The way that this improves performance (over the era of von Neumann) is that a new instruction can be started on every cycle. The “state” of the FSM on any cycle then contains the partial results of many (sequential) instructions that are in various stages of processing as the pipeline flow progresses.
The overall latency through the pipeline is longer than the latency of the von Neumann era, since staging latches have been added between the logical components of the machine, but the instruction issue rate can be much higher, since there is no need to wait for the completion of each instruction to issue the next instruction.
In pipelining the flow for any instruction is generally the same as that for any other (similar) instruction, and all data being operated on (called “operands” herein) are operated on in the same way by the same circuitry. While this makes the processor's behavior very predictable, and (arguably) “simple” to design, it is frequently the case that unnecessary work is done by the pipeline. This is precisely because all operands are treated the same.