1. Field of the Invention
The present invention relates to a pipelined data processor. More specifically, the present invention relates to a data processor capable of pre-branch processing to a return address in the initial stage of pipeline processing of a subroutine return instruction.
2. Description of the Prior Art
FIG. 1 is a schematic diagram of a typical pipelined data processor.
In FIG. 1, numeral 1 designates an instruction fetch (IF) stage, numeral 2 designates an instruction decoding (D) stage, numeral 3 designates an address calculation (A) stage, numeral 4 designates an operand fetch (F) stage, numeral 5 designates an execution (E) stage and numeral 8 designates an operand writing (W) stage.
The operation of the pipeline will now be described. The data processor of FIG. 1 has six pipeline stages: instruction fetch stage 1 which fetches instructions, instruction decoding stage 2 which decodes the fetched instructions, address calculation stage 3 which calculates operand addresses and the like, operand fetch stage 4 which fetches operand data, execution stage 5 which processes the fetched operands according to the decoded instruction, and operand writing stage 8, which writes the processed operands. Because they are pipelined, these six stages can operate on different instructions at the same time. However, where a conflict occurs for an operand or memory access, a lower-priority stage suspends processing until the conflict is eliminated.
As described above, in the pipelined data processor, processing is divided into a plurality of stages according to the flow of data processing, and each stage is operated simultaneously, and thereby the average processing time required for one instruction is shortened and the performance as a whole is improved.
However, in the data processor pipelined in such a manner, where an instruction disturbing the flow of instructions, such as a branch instruction, has been executed in the execution stage 5, all processing of the preceding stages, is canceled, and an instruction to be executed next is fetched anew.
Thus, when an instruction disturbing the pipeline processing is executed, the overhead of pipeline processing is increased and the processing speed of the data processor is not increased. To improve the performance of the data processor, various ideas have been practiced to curtail the overhead on executing an instruction such as an unconditional branch instruction or a conditional branch instruction.
For example, using a so-called branch target buffer storing the address of branch instruction and the branch target address in combination, the flow of instructions is predicted in the instruction fetch stage. See, for example, J. K. F. Lee and A. J. Smith, "Branch Prediction Strategies and Branch Target Buffer Design" IEEE COMPUTER Vol. 17 No. 1, January 1984, pp.6-22.
As described above, the curtailment of the overhead at branch instruction execution is made by predicting the flow of processing in the initial stage of the pipeline processing and passing an instruction predicted to be executed next through the pipeline (hereinafter referred to as pre-branch processing). However, the prediction of the processing flow of the return instruction from the subroutine has been difficult because of the dependence of a return address from a subroutine upon an address of the corresponding subroutine call instruction.
In the conventional data processor, as described above, the return address from the subroutine depends upon the address of the corresponding subroutine call instruction in executing the return instruction from the subroutine, and therefore no effective means for predicting the flow of processing has been available.