The present invention relates to methods and apparatus for processing pipeline instructions and, more particularly, to processing forward branch (or jump) instructions that affect a forward advancement in the sequential instructions of a processing pipeline by nullifying only those instructions already in the pipeline that are to be skipped.
Microprocessors execute software programs that include a plurality of instructions, such as ADD, LOAD, MOV, AND, OR, etc. Microprocessor instruction sets typically support so-called branch (or jump) instructions, which alter the instruction flow of the microprocessor by abruptly discontinuing a sequential flow of the instructions. This can involve branching to an altogether separate portion of the program, advancing one or more instructions ahead in the sequence, moving back one or more instructions in the sequence, etc.
FIG. 1 is a flow diagram illustrating the process flow through a plurality of instructions including a branch forward instruction and other instructions, i.e., instruction #1, instruction #2, instruction #3, and instruction #4. At action 10, the branch instruction is analyzed to determine where the process should branch. More particularly, an offset associated with the branch instruction is obtained and used to determine how many instructions in the sequence should be skipped. In this example, the offset is +3, indicating that the processes flow should advance by three instructions (from the current instruction), thereby skipping instruction #1 and instruction #2. The advance of three instructions in the sequence is only carried out if a condition is met, such as whether a flag bit or the contents of a register is greater than, equal to, less than, etc. a particular value. Assuming that the condition is met (i.e., the condition is true), then the process flow at action 10 advances to instruction #3, which is a forward advancement of three through the sequential instructions. On the other hand, if the condition is not met (i.e., the condition is false), then the process flow advances to the next instruction in the sequence, in this example instruction #1 is carried out.
The change in the instruction flow illustrated in FIG. 1 (i.e., skipping instruction #1 and instruction #2) when the condition is true is accomplished by modifying a program counter of the microprocessor, which points to an address in memory of the next instruction to be fetched, decoded, executed, etc. Thus, branch instructions are usually defined as follows:BRANCH condition, offsetIf condition=true, then PC<−PC+offset
Reference is now made to FIG. 2, which is a sequence diagram illustrating how the instructions of the process flow of FIG. 1 are carried out in a single scalar processor. A single scalar processor is a processing pipeline in which a single instruction is dispatched into the pipeline at any particular cycle and only one instruction is capable of being executed in any given cycle. Each of the instructions, branch, instruction #1, instruction #2, instruction #3, and instruction #4, is shown opposite a plurality of sequentially disposed boxes, where each box represents an operation carried out by the pipeline in a particular stage or cycle. Eight sequential boxes are shown for each instruction and each set of eight boxes are offset by one cycle, which is consistent with the forward pipeline behavior of a single scalar processor pipeline.
The specific sequence in the pipeline is as follows: The branch instruction enters the pipeline at cycle 1 and a particular operation, A, is performed during that cycle, although for the purposes of this example the specifics of that operation are not important. At cycle 2, the branch instruction advances to the next stage of the pipeline where another operation, B, is carried out. Instruction #1 enters the first stage of the pipeline in cycle 2, where operation A is carried out on that instruction. In cycle 3, the branch instruction advances to a third stage in the pipeline, where a new operation, C, is carried out on that instruction. Instruction #1 advances to the second stage of the pipeline in cycle 3, where operation B is performed. Instruction #2 enters the first stage of the pipeline at cycle 3, where operation A is performed on that instruction.
In cycle 4, the branch instruction advances to a fourth stage of the pipeline, where a decode and dispatch operation, DD, is carried out. Instruction #1 advances to the third stage in the pipeline in cycle 4, where operation C is carried out on that instruction. Similarly, instruction 2 advances to the second stage of the pipeline and instruction #3 enters the first stage of the pipeline in cycle 4. In cycle 5, the branch instruction advances to a fifth stage in the pipeline where an execution operation, EX, is carried out. At the fifth cycle, instruction #1 has entered the fourth stage of the pipeline, instruction #2 has entered the third stage of the pipeline, instruction #3 has entered the second stage of the pipeline, and instruction #4 has entered the first stage of the pipeline.
The execution of the branch instruction in the fifth cycle, however, modifies the program counter of the microprocessor (assuming that the condition is true, see action 10, FIG. 1) such that it points to an address in memory corresponding to where instruction #3 is stored. This is illustrated in FIG. 2 by the dashed line extending from the fourth stage of the pipeline for the branch instruction to the first stage in the pipeline for the second occurrence of instruction #3. In keeping with the process flow illustrated in FIG. 1, instruction #1 and instruction #2 should be skipped and instruction #3 should be executed. To achieve this, each of the execution operations EX for instructions #1-#4 in cycles 6, 7, 8, and 9 are nullified (cancelled) as illustrated by the “X” in those boxes. Further, instruction #3 re-enters the first stage of the pipeline in cycle 6, and instruction #4 re-enters the first stage of the pipeline in cycle 7. Thus, after the sequential operations A, B, C, and DD are again carried out, instruction #3 and instruction #4 are executed in cycles 10 and 11, respectively.
Although the pipeline process illustrated in FIG. 2 achieves the process flow illustrated in FIG. 1, certain disadvantages become evident. When the branch instruction is executed, the value of the program counter is overwritten, which can degrade the performance of the pipeline because, among other reasons, it nullifies all cycles (or stages, slots, etc.) in the pipeline instead of only those that should be skipped. For example, the execution of instruction #3 and instruction #4 in cycles 8 and 9, respectively, was nullified even though such execution would be desirable. Indeed, because instruction #3 and instruction #4 were skipped (nullified) they must re-enter the first stage of the pipeline in cycles 6 and 7, respectively, and all of the operations leading up to the execution operations (e.g., A, B, C, DD) must be repeated. The resultant nullified stages in the pipeline, i.e., the fifth stage of the pipeline in cycles 8 and 9, degrade the overall performance of the pipeline and reduce throughput. Indeed, where the execution of instruction #3 and instruction #4 would have occurred in cycles 8 and 9, respectively, such execution is delayed by two cycles, i.e., to cycles 10 and 11, respectively.
It is noted that the undesirable nullification of instructions in the pipeline occurs most often when relatively small forward branches are made. This is so because the protocol followed by the microprocessor dictates that all execution operations in the pipeline are to be nullified irrespective of whether certain of the instructions already in the pipeline should still be executed, as would occur when the offset is relatively small.
Conventional methods to avoid the above disadvantages employ so-called conditional instructions (e.g., conditional MOV), which are executed only if a specified condition within the instruction is true. Turning again to FIG. 1, if conditional instructions were employed, the branch instruction, instruction #1 and instruction #2 would be replaced with two conditional instructions, which would be executed only if the specified conditions were true. Unfortunately, conditional instructions require the assigning of unique opcodes or unique operand bits within the instruction and, consequently, very few microprocessor instruction sets support conditional instructions.
Accordingly, there are needs in the art for new methods and apparatus for processing pipeline instructions where forward branch instructions may be effected in a processing pipeline without modifying the program counter, and without nullifying instructions already within the pipeline that will have to be re-introduced into the pipeline. Indeed, such new methods and apparatus would desirably achieve an increase in pipeline performance including higher throughputs and improved data processing.