The present invention relates to control of a branch operation in a processor. More particularly, it relates to control of a delayed branch operation in a processor employing a delayed branch method and to control of the rewriting of a conditional flag to be referenced by a conditional branch instruction.
In so-called pipeline processing whereby instructions are executed in a pipelining manner, a method termed delayed branch has conventionally been used. By way of example, a consideration will be given to the case shown in FIG. 10, wherein instructions are executed by pipeline processing consisting of three stages of instruction fetch (F), instruction decode (D), and instruction execute (E). According to the pipeline processing shown in FIG. 10, the decoding of a branch instruction is followed by the fetching of an instruction at the branch destination. What results is a blank slot (delay slot) for at least one stage. The number of the resulting blank slots is given by the number of stages other than the fetch and execute stages in the pipeline processing.
Delayed branch is a method of eliminating a useless blank slot by inserting, in the delay slot, an instruction residing at an address subsequent to a branch instruction. By using the method, improved performance is expected of a processor (see Japanese Laid-Open Patent Publication HEI 4-127237 and Japanese Laid-Open Patent Publication HEI 3-122718).
A conditional branch instruction determines whether or not a branch is implemented based on a conditional flag reflecting the result of executing an operate instruction, a transfer instruction, or the like. There has been a conventional method wherein the flexibility with which instruction execution sequence is determined is increased by controlling the rewriting of a conditional flag. In a RISC processor SPARC, e.g., one control bit for determining whether or not a conditional flag is rewritten is provided in the code of an operate instruction. When the value of the control bit is "1", the result of the operation is reflected in the conditional flag. When the value of the control bit is "0", the conditional flag is not rewritten (see "SPARK Architecture Manual," Sun Microsystems Inc., 1991). The method permits the determination of whether or not the conditional flag is rewritten for each instruction, so that the flexibility with which instruction execution sequence is determined in a compiler or the like is increased.
However, the conventional method has the following problems.
First, in the delayed branch method as mentioned, instruction execution sequence is complicated when a sequence of delayed branch instructions are given consecutively.
FIG. 11 shows an example of a program on the assembler level when a sequence of delayed branch instructions are given consecutively. A first delayed branch instruction br200 (specifying a branch to an address 400 when a condition is satisfied) resides at an address 100. A second delayed branch instruction br400 (specifying a branch to an address 400 when a condition is satisfied) resides at an address 101. FIG. 12 shows a relationship between the occurrence or nonoccurrence of a branch specified by each of the delayed branch instructions and instruction execution sequence in executing the program shown in FIG. 11. The operation of the program shown in FIG. 11 is divided into the two cases where a branch is caused or not caused by the first delayed branch instruction br200 and then further divided into the two cases where a branch is caused or not cause by the second delayed branch instruction br400. As a result, the total of four cases should be considered, as shown in FIG. 12.
In the case where the condition for each of the first and second delayed branch instructions br200 and br400 is satisfied and branches occur, the instruction at the address 200 is fetched because the branch condition for the first delayed branch instruction br200 is satisfied (the instruction at the address 200 is designated at (200)) and then the instruction at the address 400 is fetched because the branch condition for the second delayed branch instruction br400 is satisfied (the instruction at the address 400 is designated at (400)), as shown in FIG. 13. Consequently, the processor performs a complicated operation of jumping to the destination (address 200) of the branch caused by the first delayed branch instruction br200 to execute only one instruction and then further jumping to the destination (address 400) of the branch caused by the second delayed branch instruction br400 to execute an instruction, as shown in FIG. 12.
This seriously impairs the readability of the program on the assembler level, causing a bug in the program.
In particular, an unskilled programmer who does not have full understanding of delayed branch would produce a program without expecting such an operation, so that the program inevitably suffers from a bug. In addition, since the bug is recognized only when individual conditions for consecutive delayed branch instructions are satisfied to cause branches, previous debugging is extremely difficult. Oftentimes, the bug cannot be discovered until the apparatus is activated.
A skilled programmer who has better knowledge of delayed branch would make a proper modification to the program by inserting a no-operate (NOP) instruction between the consecutive delayed branch instructions with a view to avoiding the problem. However, since the modification is troublesome, it is forgotten many a time. Moreover, if the processor has an increased number of delay slots, it is necessary to insert as many no-operate instructions as the delay slots, so that the program becomes redundant and the memory capacity for storing the program is increased accordingly.
The problem may arise even when the delayed branch instructions are not consecutively given but they are relatively close in sequence in the program. Although FIGS. 12 and 13 are based on the assumption that the number of delay slots in the processor is 1, a similar problem occurs when the processor has a larger number of delay slots and the spacing between inconsecutive delayed branch instructions is small relative to the number of delay slots in the processor, which complicates the sequence in which the instructions are executed when individual branch conditions for the instructions are satisfied.
There is another type of processor wherein a control bit for determining whether or not a delayed branch is implemented is provided in the code of a delayed branch instruction (see Japanese Laid-Open Patent Publication HEI 4-127237). If the control bit is 0, the processor does not execute an instruction placed in a delay slot when a branch is enabled. FIG. 14 shows a relationship between the occurrence or nonoccurrence of a branch specified by each of delayed branch instructions and instruction execution sequence when the program shown in FIG. 11 is executed by the processor of this type. In FIG. 14, "* * * indicates that the instruction placed in the delay slot is not executed. In this case, if the control bit for the first delayed branch instruction br200 is set to 0, the first delayed branch instruction br200 determines whether or not a branch to the address 200 is implemented and the second delayed branch instruction br400 is executed only when no branch occurs, so that the readability of the program on the assembler level is not impaired.
However, the provision of the control bit for determining the occurrence or nonoccurrence of a delayed branch in the instruction code increases the bit width of the instruction code by 1 bit, so that the memory capacity for storing the program is increased disadvantageously. In particular, an increased memory capacity forms a fatal drawback to a portable telecommunication device in terms of device size, power consumption, manufacturing cost, and the like. Moreover, the processor requires an additional circuit for controlling delayed branch based on the control bit in the delayed branch instruction code, which is to be provided in the instruction decoder of the processor.
Furthermore, in the case where the bit width of the instruction code is preliminarily determined by specifications, the provision of even one control bit adds constraints to device design. If the instruction code is composed of 24 bits, e.g., at most only several bits other than the bits required to represent an instruction type, a specified address, and the like can be used for design. As a result, even one control bit may significantly reduce design flexibility.
A similar problem also occurs in the conventional method of controlling the rewriting of a conditional flag. The provision of a control bit for the rewriting of the conditional flag causes the problem of an increased memory capacity for storing a program, the problem that the instruction decoder requires an additional internal circuit for controlling the rewriting of the conditional flag based on the control bit in an instruction code, and the problem of reduced flexibility with which device design is conducted.