(1) Field of the Invention
The present invention is related to a program translating apparatus which translates source programs into machine instruction sequences and links the machine instruction sequences to each other, and to a processor which executes the machine instruction sequences which have been produced by the program translating apparatus. The present invention is particularly related to a program translating apparatus and a processor which are free from pipeline stalls during the execution of a branch instruction which causes a branch to a subroutine.
(2) Description of the Related Art
A pipeline processing is one of the fundamental techniques for speeding up the process of CPU, which is hereinafter referred to as a processor.
In the pipeline processing, the process of one instruction is divided into a plurality of stages, and all the stages are executed at the same time to speed up the process.
However, the execution of a branch instruction is accompanied by a pipeline stall, which makes the performance of the pipeline processing fail to reach the theoretical level. This phenomenon is referred to as a branch hazard.
FIG. 1 shows an instruction sequence which involves the branch hazard.
FIG. 2 shows the pipeline flow of the instruction sequence in clock cycles 1 through 6. The pipeline is composed of three stages: instruction fetch (hereinafter IF), instruction decode (hereinafter DEC), and instruction execution and effective address generation (hereinafter EX).
It is assumed that an instruction 1 is a branch instruction which causes a branch to a subroutine which starts at address A. The instruction 1 is fetched at IF stage in clock cycle 1, and executed at EX stage in clock cycle 3. Consequently, an instruction A on address A is fetched in clock cycle 4, and executed in clock cycle 6. The execution of the instruction A is delayed three clock cycles after the execution of the instruction 1 because instructions 2 and 3 are already in the pipeline and must be nullified. The presence of pipeline stalls is indicated in gray in FIG. 2.
A method of solving such a branch hazard is known as a delayed branch method, which is referred to in David A. Patterson and John L. Hennesy, "Computer Architecture: A Quantitative Approach", Morgan Kaufmann Publishers, 1990, p265-p270.
In the delayed branch method, a compiler schedules instructions in the manner that a branch target instruction is placed beforehand in a location (branch-delay slot) which immediately follows the branch instruction.
FIG. 3 shows an instruction sequence where branch target instructions A and B are placed on branch-delay slots, namely addresses 2 and 3, respectively.
FIG. 4 shows the pipeline flow of the instruction sequence shown in FIG. 3. As is apparent from the pipeline flow, the delayed branch method causes no pipeline stall because useless instructions are not fetched. Consequently, there is no issue of the branch hazard.
However, the delayed branch method still has a drawback that a branch target instruction cannot not always be moved to a branch-delay slot, and as a result, a branch hazard is not always solved.
To be more specific, the subroutine which starts at address A in FIG. 1 may not be recognized by the compiler. Generally, a compiler compiles a file of programs as a unit. If a program which includes the instruction 1 and a subroutine which includes the instructions A and B are stored in different files, and if only the file with the program is given, the compiler cannot place the instructions A and B in branch-delay slots. Consequently, in the case where a branch is taken to an instruction that is stored in another file, the branch hazard is still unavoidable.
In order to solve such a problem, a program may be generated not to cause a branch to a subroutine in another file, or all related subroutines may be stored in the same file by checking the branch points of a program in advance.
However, in either case, it must always be considered to arrange branch target subroutines when a program is generated, which deteriorates the operation efficiency of designing and developing programs. Moreover, the size of files must be increased, so that a large amount of time is required for program translation.