1. Field of the Invention
The present invention relates to a branch prediction circuit of a microprocessor for executing, for example, pipeline processing.
2. Description of the Related Art
In a microprocessor for performing pipeline processing, a branching instruction which changes the flow disturbs the pipeline processing. Therefore, an ineffective cycle is generated, and this degrades capabilities. Particularly, a microprocessor whose operation frequency exceeds a GHz has a multilayered pipeline. Therefore, in the microprocessor, ten or more cycles are required from when the instruction is fetched and recognized as the branching instruction until a branched address is calculated.
FIG. 5 shows an operation in the microprocessor having an initial multilayered pipeline during the branching instruction generation. This processor requires, for example, eleven cycles until the instruction is recognized as the branching instruction after the instruction is fetched. This invalidates a plurality of subsequent instructions which have entered the pipeline in and after a delay slot. Therefore, processing efficiency is remarkably degraded. Here, the delay slot is an instruction which is executed immediately after the branching instruction and which is unrelated to the branching instruction. This delay slot is set by a compiler.
In this manner, the processor having the initial multilayered pipeline has many instructions invalidated on and after the branching instruction. In recent years, microprocessors have had a branch prediction circuit in order to reduce the invalidated instructions. The branch prediction circuit is constituted to store a history of previously executed branching instructions beforehand, and to change over the program counter to the previously branched-to address based on the history after the branching instruction is fetched.
For branch prediction, when much previous history is stored, a precise prediction can be achieved. For this, a branch table for storing the history is constituted of a memory which has a large capacity similar to an instruction cache. In this branch table, an address at which the branching instruction is present (hereinafter referred to as the address of the branching instruction) is associated and stored with an address previously branched to from the branching instruction (hereinafter referred to as the branched address). However, when the branch table is constituted of a large-capacity memory, it takes several cycles to access the branch table. Therefore, even when the address of the program counter is the address of the previously executed branching instruction, it requires several cycles to change to the branched address of the branching instruction.
FIG. 6 shows an operation during branch prediction. In this example, the operation includes: accessing the branch table with the address of the branching instruction; and executing a delay cycle while the branch table is accessed. In the case of performing branch prediction in this manner, the address can be changed to the branched address with fewer cycles as compared with a case of not performing the branch prediction. However, even when the branch prediction is performed, several ineffective cycles are generated in and after the delay slot.
Since the initial processor has a low operation frequency, the branched address can be obtained during execution of the delay slot. Control can be switched to the branched address after the delay slot. Therefore, it is possible to remove the ineffective cycles.
However, in recent years, the operation frequency of microprocessors has been raised, and the bus width in which the instruction is read from the memory has increased. Therefore, it is possible to simultaneously read a plurality of instructions in one cycle. For example, when one instruction is 32 bits, and the bus width of the memory is 64 bits, two instructions are read in parallel. Therefore, even when branch prediction is executed, ineffective cycles are generated.
On the other hand, as shown in FIG. 7, for example, when the delay slot makes an instruction cache mistake, the cache needs to be refilled. After the branching instruction is fetched as described above, the branch table is accessed with the address of the branching instruction, and the branched address is read out. When the branched address is read out, and the delay slot makes the instruction cache mistake, it is necessary to hold the branched address read from the branch table while refilling the cache. Thereafter, after the cache is refilled, and the delay slot is re-executed, an operation of changing the address of the program counter to the held branched address is necessary. This control is complicated, and increases circuit scale.
As described above, for a microprocessor which has a wide memory bus width at a high operation frequency, it is difficult to reduce the ineffective cycles by the branching instruction. There has been a demand for a branch prediction circuit which can suppress the ineffective cycles by the branching instruction as much as possible even in a microprocessor having a wide memory bus running at high operation frequency.