The present invention relates to a data processor having a function to execute a branch instruction and a function to prefetch an instruction, and in particular, to a data processor in which an instruction prefetch function is linked with branch history information to execute a branch instruction at a high speed.
Conventionally, to increase the processing speed of a branch instruction, there has been described in JP-A-1-240931 (laid-open on Sep. 26, 1989) a data processor in which an address of a branch instruction and an address of a branch target instruction are stored as branch history information in a buffer. When an instruction prefetch is achieved, the history information is checked with a prefetch address as a key so that control branches accordingly.
Heretofore, moreover, for a high-speed execution of a branch instruction, there has been described in the JP-A-2-166520 (laid-open on Jun. 27, 1990) a data processor in which an address of an instruction preceding a branch instruction and an address of a branch target instruction are stored as branch history information in a buffer. When the instruction is decoded, the history information is checked with an instruction address thereof as a key to skip execution of an unconditional branch instruction, thereby achieving a branch at a high speed.
The conventional technologies have been devised for primarily increasing the branch processing speed of an unconditional branch instruction. However, it has been clarified through study by the inventors of the present invention that these technologies cannot cope with the branch processing of a return (rts) instruction for a return from a subroutine in association with the unconditional branch instruction for the following reasons.
In each of the prior technologies, assuming that each branch instruction has a fixed branch target address in any case, it is considered that the branch history information is useful for the subsequent branch processing. However, the assumption is not applicable to the case of the return instruction from a subroutine. Since the return instruction is used to return control from the subroutine to a return address of the call side, the return address varies depending on the address of the call side.
The following shows the subroutine call and the return processing mechanism.
First, the routine on the call side executes a subroutine call (bsr) instruction. In the execution, a return address is calculated to be stored in a last-in first-out (LIFO) queue called a stack (generated by the software) in the memory. The return address is calculated by using the address of the subroutine call instruction. Control is then transferred by the instruction to the subroutine. The subroutine is then executed. In the final step of the subroutine, a return instruction is effected. The return address is read from the stack and then control is passed to the return address, thereby transferring the processing to the routine on the call side.
As set forth above, the return address is decided by the address of in the subroutine call instruction. Consequently, when a plurality of subroutine call instructions are included in the program, it is impossible to uniquely determine a return address for the return (rts) instruction written in the associated subroutine.