Referring to FIG. 1, a block diagram is shown of a four stage Single-Completion Instruction Pipeline of an early microprocessor 100. The pipeline stages include: 1) Fetch; 2) Arithmetic operation (ALU); 3) Memory access; and 4) Write back. In operation, the microprocessor fetches an instruction for execution in cycle 1, executes the instruction in cycle 2, performs a read or write to memory in cycle 3, and writes the result of the ALU operation (from cycle 2), or the memory read (from cycle 3), into its register file in cycle 4. If each pipeline stage requires one processor clock to complete its function, then in the four stage pipeline, an instruction requires four clocks to complete execution. In this example, the execution rate of the pipeline is one instruction every four clock cycles.
One skilled in the art of microprocessor design will recognize that one of the problems in the Single-Completion Instruction Pipeline of FIG. 1 is that in any given clock cycle, only one of the pipeline stages is being utilized for instruction execution. For example, during clock cycle 1, the Fetch stage is busy fetching an instruction for execution, but the ALU, Memory and Write stages are essentially idle. An idle processing stage is considered inefficient and therefore undesirable within a processing system.
A remedy for the idleness described above is shown in FIG. 2, to which attention is now directed. FIG. 2 includes a four stage pipeline microprocessor 200 similar to the one shown in FIG. 1. However, in the microprocessor 200, rather than waiting for an instruction to be completed before the next instruction can be fetched (four clock cycles), a new instruction is fetched each clock cycle. In this four stage pipeline, four instructions are in the process of executing simultaneously, one at each stage of the pipeline. While it still takes four clock cycles for the first instruction to be completed, a new instruction is completed every clock cycle thereafter. Thus, the idleness illustrated in FIG. 1 above has been removed, and the overall processing efficiency has been improved.
A problem is created however, in the parallel pipeline of FIG. 2, when an instruction depends on the completion or resolution of a previous instruction before it can begin. This problem is illustrated in FIG. 3 to which attention is now directed.
FIG. 3 illustrates a parallel pipeline microprocessor 300 such as the one shown in FIG. 2. However, in this Figure, Instruction 2 is dependent on the resolution of Instruction 1 before it can begin. For example, presume Instruction 1 is of the form: LOAD REG1, MEM. That is, Instruction 1 causes a value from memory location MEM to be retrieved from memory, and stored within a register REG1 within the microprocessor 300. Now, presume that Instruction 2 is of the form: ADD REG1,2. That is, Instruction 2 adds the contents of the REG1 register to the numeric value 2, and stores the result in the REG1 register. In the four stage pipeline 300, Instruction 1 does not retrieve the contents of memory location MEM until the end of clock cycle 3. And, the retrieved contents of memory location MEM are not stored into the register REG1 until the end of clock cycle 4. If Instruction 2 were allowed to proceed into the ALU stage of the pipeline in clock cycle 3, the contents of the REG1 register, to which the value of 2 is to be added, would not yet be updated with the contents from MEM. Thus, the result of the addition would either be unknown, or at the very least incorrect. In this example, the only way to make sure that Instruction 2 is executed using the correct contents of register REG1 is to hold or stall execution of Instruction 2 by at least two clock cycles.
What is shown in FIG. 3 is just one example of what is known as a pipeline hazard. In general, there are two types of pipeline hazards: 1) execution hazards; and 2) instructions hazards. Execution hazards are hazards created by the execution of one instruction, and seen by the execution of another instruction, such as shown in FIG. 3. Instruction hazards are those created by the execution of one instruction, and seen by the instruction fetch of another instruction. For example, a first instruction might update a TLB entry in a TLB table, and a second instruction would fetch an instruction using the updated TLB entry. If the second instruction attempted to fetch an instruction from the TLB entry prior to the update, it would be fetching an incorrect instruction. In either case, to insure that all instructions execute properly within a pipelined microprocessor, it must be assured that an instruction that depends on the resolution of a previous instruction is either stalled or delayed, at least until the instruction from which it depends completes. The methodology utilized to insure proper execution of dependent instructions is known as hazard protection, or hazard clearing.
Hazard protection is typically performed either in hardware, or in software. When hazard protection is provided in hardware, a portion of the microprocessor is dedicated to tracking each instruction to be executed for the purpose of detecting instruction dependencies. When an instruction dependency is detected, the hardware causes an interlock on the dependent instruction, thereby stalling the dependent instruction, until the instruction from which it depends completes execution. A benefit of designing a microprocessor to incorporate hardware hazard protection is that a software programmer is shielded from the intricacies associated with instruction execution. That is, the programmer does not have to worry about how many stages it takes for a first instruction to be resolved before starting a second dependent instruction. S/he can simply write the instructions in the order desired for execution, and trust that the hazard hardware in the microprocessor will insure proper execution. A downside of providing hazard protection in hardware is that such hardware adds considerable complexity to the microprocessor, and that impacts both the design cost and ultimate cost of the microprocessor. In addition, design changes in the architecture that effect execution order, the number of stages in the pipeline, or execution timing, must be considered in the hazard hardware, thereby making design changes in the hazard hardware necessary. For many types of microprocessors, the additional complexity associated with providing hazard protection in hardware is considered inappropriate. For these microprocessors, hazard protection is typically provided via software.
Software hazard protection places the burden of preventing hazards on the software programmer, or on the designer of the compiler used by the software programmer. To illustrate how a software programmer would resolve the hazard shown in FIG. 3, consider the following program:
LOAD REG1, MEMNOPNOPADD REG1, 2
A programmer, with knowledge of the pipeline structure of the microprocessor 300 understands that Instruction 2 is dependent on the resolution of Instruction 1, and that it will take two additional clock cycles between Instructions 1 and 2 to resolve the dependency. S/he therefore inserts two NOP (no operation) instructions in between Instructions 1 and 2. Alternatively, if the programmer utilized a compiler that was designed specifically for the microprocessor 300, s/he could trust that the compiler would detect the dependency between Instructions 1 and 2, and would insert the necessary number of NOP instructions between the two instructions. From the viewpoint of the microprocessor 300, it is simply fetching an instruction every clock cycle and passing the fetched instructions down the pipeline for execution. The microprocessor 300 has not needed any additional hardware to resolve the hazard, and yet the hazard has been prevented.
A problem with software hazard clearing is that it places the burden of understanding the nuances of instruction execution within a particular microprocessor implementation on either the programmer, or alternatively on the designer of the compiler for the microprocessor. While such a burden is ubiquitous within the field of modern microprocessors, it is nonetheless a significant problem. Not only must a programmer understand the implementation of the processor for which s/he is coding, s/he must also understand how much delay is associated with each instruction upon which other instructions depend. Within a deeply pipelined microprocessor (12 or more stages), the programmer must insert between 1 and 10 NOP's between dependent instructions, depending on how far the dependent instructions are separated within the program, and depending on how far the first instruction must proceed in the pipeline before it is resolved. To accurately code for a deeply pipelined microprocessor, a programmer must be very proficient in the implementation hazards of the processor.
An additional problem with using software hazard clearing is that once a program is developed for a microprocessor, it is unlikely that the program will operate on subsequent generations of the microprocessor without a significant rewrite of the program. For example, if a microprocessor advances from having a five stage pipeline, to having a twelve stage pipeline, it is unlikely that any of the hazard clearing methods used for the five stage pipeline will operate correctly in the twelve stage pipeline. This is true even though the software architectures (i.e., the instructions) of the five and twelve stage pipeline are identical.
What has become apparent to the inventors of the present invention is the need for a hazard clearing mechanism that can be utilized by programmers, or designers of compilers, that can be implemented across multiple generations of a microprocessor architecture, that eliminates the need of rewriting hazard clearing code between processor generations, and that eases the burden on the programmer of understanding the nuances of particular microprocessor hazards.
What is also needed is a method and apparatus that allows a programmer to specify when a hazard should be cleared, without regard to the number of stages between the hazard and the dependent instruction.
Further what is needed is a method and apparatus for hazard clearing that can be utilized in conjunction with hardware hazard tracking.