1. Field of the Invention
The present invention relates to a microprocessor having delayed instructions which is capable of executing the delayed instructions after a variable delayed time is elapsed.
2. Description of Related Art
FIG. 1 is a diagram showing the sequence of process of a conventional microprocessor which will be performed based on a pipeline control. In FIG. 1, the reference number 300 designates a branch instruction, 301 denotes an instruction fetch stage in the pipeline, 302 indicates a decode stage in the pipeline, 303 designates an instruction execution stage in the pipeline, 304 denotes a write back stage in the pipeline, 305 indicates an instruction in the first delay slot, 306 designates an instruction in the second delay slot, and 307 designates an instruction to be executed at the branch target.
Hereinafter, when only the term “a branch instruction” is used, the meaning of this term includes the following two cases:
a) An instruction that will be branched to a target address indicated by a value obtained by adding an offset value stored in an operand in an instruction to a program counter (PC) value; and
b) An instruction that will be branched to a target address indicated by using an operand directly or indirectly.
When both a branch instruction and a jump instruction are described in a program, these are a branch instruction to be branched to an address as a value obtained by adding an offset value specified by an operand to a program counter value and a jump instruction to be branched directly or indirectly to an address indicated by using an operand, respectively. In addition, each of a branch instruction and a jump instruction include a subroutine-call instruction in this specification.
Next, the operation of the conventional microprocessor will be explained.
When a branch instruction is executed, the microprocessor based on a pipeline control shown in FIG. 1 gets the branch target address in the instruction execution stage at the third stage of the pipeline. At this time, instruction 305 in the first delay slot and instruction 306 in the second delay slot are at the decode stage 302 and the instruction fetch stage 301, respectively. The conventional microprocessor must treat those instructions as invalid instructions. This causes a waste in the pipeline processing. In order to eliminate this waste in the pipeline processing, there are many kinds of methods, examples of which have been disclosed in the following literature: “Computer Architecture A Quantitative Approach”, John L. Hennessy and David A Patterson, Morgan Kaufmann Publishers INC., 1990, pp.272-278. For example, in this method, an instruction scheduling or a combination of the instruction scheduling and delayed branch instructions is used in order to eliminate the waste in the pipeline. For example, the literatures, the Japanese Laid-open Publication Numbers JP-A-6/274352 and JP-A-6/131180, show the technique related to delayed branch instructions.
In general, the size of a delayed value in a delayed branch instruction is a fixed value corresponding to the architecture of a microprocessor. In one specific case, instructions to designate the number of variable delay slots are disclosed in the literature, the Japanese Laid-open Publication Numbers JP-A-6/131180.
The number of delay slots which has been designated is stored in a decrement counter. The value stored in the decrement counter is decreased according to receiving the appropriate clock signals. When this value becomes 1, a fetch operation of the branch target instruction is initiated.
FIG. 2 shows a block diagram showing a common configuration of an instruction decoder and an instruction execution section as a part of the conventional microprocessor that can execute two operations simultaneously. In FIG. 2, the reference number 341 designates an arithmetic logic unit (ALU) for executing arithmetic logic operations, 342 denotes a multiplier for executing multiplication operations, 343 indicates a program counter (PC) controller to calculate a PC value, 344 designates a memory controller for performing address calculation, 345 denotes a shifter for performing shift operations, 346 indicates a bus group consisting of buses through which two instructions will be transferred during one cycle, 348 indicates a general purpose register file, and 347 designates a decoder for decoding instructions and for transferring control signals 11 and 12 as decoded results to the instruction execution section comprising the ALU 341, the multiplier 342, the PC controller 343, the memory controller 344, the general purpose register file 348, and the shifter 345.
FIG. 3 is an explanation diagram showing an example of a program which will be executed in the conventional microprocessor. In FIG. 3, reference characters ADD, SRA, SUB, MUL, and JMP designate an add instruction, a shift instruction, a subtraction instruction, a multiply instruction and a jump instruction, respectively.
These instructions, ADD, SRA, SUB, MUL, and JMP are executed by the ALU 341, the shifter 345, the PC controller 343, and the multiplier 342 in the instruction execution section. The general purpose register file 348 holds the registers used for these operations. For example, the reference character (r3, r0, 6) indicates that the operation result obtained by performing operation between a value of the register r0 in the general purpose register file 348 and an immediate value “6” is stored into the register r3 in the general purpose register file 348.
The conventional microprocessor based on the pipeline shown in FIG. 1 can execute two instructions at the same time. Therefore, the program including the instructions shown in FIG. 3 can be converted into different instructions that execute two instructions at the same time taking care to avoid resource conflicts that would happen in the pipeline. For example, as shown in FIG. 4, the conventional microprocessor can perform each of the converted instructions.
In FIG. 4, each line corresponds to one instruction as a two-operation instruction. That is, each line shows two-operations which will be executed at the same time. There are no resource conflicts between the instructions SRA and SUB, however, these instructions have a register dependence relationship. Therefore, these two instructions SRA and SUB cannot be executed at the same time. Because of this, the no operation NOP is written at the second line in the program shown in FIG. 4. This instruction scheduling for these instructions are performed by a compiler or a programmer.
The conventional microprocessor having the configuration described above includes the following problems (1) to (3):
(1) In the conventional microprocessor having branch instructions, it is difficult to schedule instructions effectively because the delayed value specified by each delayed instruction is fixed. For example, it is possible to delete the JMP instruction at the fifth line in the program shown in FIG. 4 and to place a delayed jump instruction of a delayed value “2” at the second line instead of the NOP already written. Thereby, the fetch operation for the instruction addressed by the jump target address TGT can be executed immediately after the instruction fetch operation at the fourth line in the program shown in FIG. 4 is completed. This results in no pipeline waste.
Conventional microprocessors specifying only delayed values of “2” can execute such delayed branch instruction. However, for example, when using microprocessors designating only delayed values of “3”, a programmer or a compiler would not be able to the instructions shown in FIG. 4 to avoid a cycle loss.
(2) The configuration of a conventional microprocessor in which a value for designating a delayed value is written into a decrement counter becomes less practical when interrupts or new branch operations happen while a delayed instruction is pending. For example, because the value in the decrement counter is decreased according to operation clocks, if there is no consideration for the decrement counter value, operation clocks between the designated delayed value and an actual delayed value will get out of order by operation clocks used for interrupt processing and the like.
(3) In the conventional microprocessors, because branch instructions are the only delayed instructions, that is instructions other than the branch instructions have no delayed capability, it is difficult to schedule instructions effectively.