1. Field of the Invention
The present invention relates to a processor for executing machine-language instruction sequences using pipeline processing, and especially relates to a processor for executing branch processing at high speed.
2. Description of the Prior Art
Pipeline processing is known as one of the fundamental techniques for achieving high-speed processing by a Central Processing Unit (CPU: hereinafter processor). In pipeline processing, a process dealing with one instruction is divided into smaller stages (pipeline stages), each pipeline stage being processed in parallel to speed up the processing. However, this technique is not effective in executing branch instructions which are used in loops because a stall can occur. This phenomenon is called a branch hazard. Due to branch hazards, the operational performance of pipeline processing does not reach an optimal level.
A specific example of a program where a branch hazard will occur is given below. The appended comments written after the semicolons show the contents of the separate instructions.
(Instruction 1) mov 0,i ; Transfer 0 into i.
L: ; Label showing a branch target.
(Instruction 2) add a,b,c ; Transfer a+b into c.
(Instruction 3) mul a,b,d ; Transfer a.times.b into d.
(Instruction 4) add i,l,i ; Add 1 to i.
(Instruction 5) cmp i,3 ; Compare i with 3.
(Instruction 6) bcc L ; Branch to L if i&lt;3.
When executing the above program, the procedure in Instructions 2-5 is looped three times. In the program, the execution of Instruction 6 is followed by three stages of fetching, decoding, and executing Instruction 2 in the next three cycles. This results in a branch hazard over two cycles between the execution of Instruction 6 and the execution of Instruction 2.
As a technique for avoiding branch hazards, a processor is disclosed in Japanese Laid-Open Patent Application No. 8-314719.
In this technique, code that includes the first instruction of a loop is stored into a buffer just before the loop is started. When the program branches from the last instruction of the loop to the first instruction, the code is retrieved from the buffer and the first instruction is decoded and executed. With such an arrangement, the first instruction does not need to be fetched from an external memory each time the loop is executed, so that the branch hazards can be avoided.
However, the conventional processor described above has a drawback that its circuit is of large scale, since the processor needs a specific circuit for avoiding the branch hazards.
First, the processor has to be equipped with an adder that is specifically used for calculating a fetch address of code that follows the code including the first instruction of the loop while the code including the first instruction is stored into the buffer just before the loop is started. The calculated fetch address is then stored into an address buffer.
The processor also has to be equipped with a subtractor that is specifically used for calculating an address of the first instruction to be decoded using the fetch address that is retrieved from the address buffer when the processing branches from the last instruction to the first instruction.
This inclusion of the adder and the subtractor results in an increase in the hardware scale of the conventional processor described above.