Modern microprocessors are pipelined microprocessors. That is, they operate on several instructions at the same time, within different blocks or pipeline stages of the microprocessor. Hennessy and Patterson define pipelining as, “an implementation technique whereby multiple instructions are overlapped in execution.” Computer Architecture: A Quantitative Approach, 2nd edition, by John L. Hennessy and David A. Patterson, Morgan Kaufmann Publishers, San Francisco, Calif., 1996. They go on to provide the following excellent illustration of pipelining:                A pipeline is like an assembly line. In an automobile assembly line, there are many steps, each contributing something to the construction of the car. Each step operates in parallel with the other steps, though on a different car. In a computer pipeline, each step in the pipeline completes a part of an instruction. Like the assembly line, different steps are completing different parts of the different instructions in parallel. Each of these steps is called a pipe stage or a pipe segment. The stages are connected one to the next to form a pipe—instructions enter at one end, progress through the stages, and exit at the other end, just as cars would in an assembly line.        
Synchronous microprocessors operate according to clock cycles. Typically, an instruction passes from one stage of the microprocessor pipeline to another each clock cycle. In an automobile assembly line, if the workers in one stage of the line are left standing idle because they do not have a car to work on, then the production, or performance, of the line is diminished. Similarly, if a microprocessor stage is idle during a clock cycle because it does not have an instruction to operate on—a situation commonly referred to as a pipeline bubble—then the performance of the processor is diminished.
In the present context, the various stages of a microprocessor pipeline may be logically grouped into two portions. The top portion fetches and decodes instructions to provide to the bottom portion, which executes the instructions. The top portion typically includes an instruction fetcher for fetching program instructions from memory. Because the time required to fetch instructions from system memory is relatively large, the top portion also includes an instruction cache for caching instructions fetched from memory to reduce subsequent instruction fetch time. The primary job of the upper pipeline stages is to have instructions available when the execution stages are ready to execute them.
One means commonly employed in the upper pipeline stages to avoid causing bubbles in the execution stages is to read ahead in the program and fetch multiple program instructions into an instruction buffer. The instruction buffer provides instructions to the execution stages when they are ready to execute them. Instruction buffers are often arranged as first-in-first-out memories, or queues.
The instruction buffering technique is particularly advantageous in the situation where one or more of the instructions to be needed by the execution stages is not present in the instruction cache. In this situation, the impact of the missing cache line may be reduced to the extent the instruction buffer supplies instructions to the execution stages while the memory fetch is performed.
The buffering technique is also useful in the situation where a branch instruction is present in the program. Modern microprocessors employ branch prediction logic to predict whether the branch instruction will be taken, and if so, to provide the target address of the branch instruction. If a branch instruction is predicted taken, instructions are fetched from the target address, rather than the next sequential fetch address, and provided to the instruction buffer.
Instruction buffering is also beneficial in situations in which some processing of the instructions must be performed before they can be provided to the execution stages. For example, in some processors the instruction set allows instructions to be a variable number of bytes in length. Consequently, the processor must decode a stream of instruction bytes and determine the type of the next instruction in order to determine its length. The beginning of each instruction is determined by the length of the preceding instruction. This process is commonly referred to as instruction formatting. Because instruction formatting requires some processing time, it is advantageous to format multiple instructions and buffer the formatted instructions in the upper portion of the pipeline so that formatted instructions are available to the executions stages when needed.
In addition to fetching instructions, the upper pipeline stages also generate information related to the fetched instructions, i.e., information besides the instruction bytes themselves, which the execution stages utilize when executing the instructions. An example is branch prediction-related information, which may be needed by the execution stages in order to update the branch prediction history or to correct for a mispredicted branch instruction. Another example is the length of the instruction, which must be determined in the case of processors that execute variable length instructions. The related information may be generated later than the clock cycle in which the instruction bytes are ready to be provided to the instruction buffer. However, the related information must be provided to the execution stages in sync with the instruction to which it is related.
One solution to this problem is to add another pipeline stage to give the related information time to be buffered and provided to the execution stages. However, this solution has the disadvantage of potentially diminishing performance. In particular, when branch instructions are mispredicted, the pipeline stages above the mispredicted branch instruction must be flushed of their instructions and instruction fetching must be resumed again at the mispredicted branch. The greater the number of stages that must be flushed, the greater the likelihood that bubbles will be introduced into the execution stages of the microprocessor pipeline. Hence, it is desirable to keep the number of pipeline stages as small as possible. Thus, a better solution to the problem is needed.