Modern microprocessors are pipelined microprocessors. That is, they operate on several instructions at the same time, within different blocks or pipeline stages of the microprocessor. Hennessy and Patterson define pipelining as, “an implementation technique whereby multiple instructions are overlapped in execution.” Computer Architecture: A Quantitative Approach, 2nd edition, by John L. Hennessy and David A. Patterson, Morgan Kaufmann Publishers, San Francisco, Calif., 1996. They go on to provide the following excellent illustration of pipelining:                A pipeline is like an assembly line. In an automobile assembly line, there are many steps, each contributing something to the construction of the car. Each step operates in parallel with the other steps, though on a different car. In a computer pipeline, each step in the pipeline completes a part of an instruction. Like the assembly line, different steps are completing different parts of the different instructions in parallel. Each of these steps is called a pipe stage or a pipe segment. The stages are connected one to the next to form a pipe—instructions enter at one end, progress through the stages, and exit at the other end, just as cars would in an assembly line.        
Synchronous microprocessors operate according to clock cycles. Typically, an instruction passes from one stage of the microprocessor pipeline to another each clock cycle. In an automobile assembly line, if the workers in one stage of the line are left standing idle because they do not have a car to work on, then the production, or performance, of the line is diminished. Similarly, if a microprocessor stage is idle during a clock cycle because it does not have an instruction to operate on—a situation commonly referred to as a pipeline bubble—then the performance of the processor is diminished.
Typically, the initial stages of a microprocessor pipeline fetch program instructions and provide the instructions to logic that dispatches, or issues, the instructions to the stages of the pipeline that actually execute the instructions, such as arithmetic logic units that perform operations like addition, subtraction, multiplication, division, etc. Modern microprocessors include multiple execution units that execute the program instructions, such as integer units, floating-point units, or SIMD (single instruction, multiple data) units. The dispatch logic determines to which of the execution units to send each instruction, and gathers the input data (operands) needed by the instruction to perform the specified operation to generate a result. If the operands are not available for an instruction, then a bottleneck may develop at the instruction dispatcher with instructions backing up behind the bottlenecking instruction, even though they may be destined for a different execution unit than the bottlenecking instruction or may not be dependent upon the bottlenecking instruction. The result may be that the execution units are sitting idle with no instructions to execute.
Typically operands are available relatively quickly, since most operands are obtained from registers within the microprocessor that can provide the operands within a single clock cycle. Other operands, such as those specified by load instructions, specify their operands via memory addresses. Typically the memory operands can be obtained from cache memories inside the microprocessor that can provide the operands within a few clock cycles. However, sometimes the data specified by a load instruction may not be immediately available, such as if the specified data is missing in the cache. In modern microprocessors, the time required to fetch data from system memory into the microprocessor is between one and two orders of magnitude greater than the time required to fetch data from a register or cache within the microprocessor. The trend appears to be toward this ratio getting larger, thereby exacerbating the problem of load instructions causing pipeline stalls at the instruction dispatcher and creating pipeline bubbles. Therefore, what is needed is an apparatus and method for avoiding this problem.