1. Field of the Invention
The present invention relates to a data processor, such as a microprocessor, and more particularly to a data processor being provided with a pipeline system including a branch prediction system.
2. Description of the Prior Art
A data processor, such as a microprocessor, often utilizes a pipeline system for processing instructions or data at high speed.
The pipelining of, for example, instruction processing, executes, in parallel, a series of processes including fetch, decode, and execute operations. The time required for processing one instruction in the pipeline system is about the same as in a non-pipelined system, but the entire throughput is improved to enable to processing at high speed.
In instruction pipelining, when the target of a branch instruction is required, the prefetched instructions following the branch instruction are cancelled and the branch target instruction at the destination address of the branch instruction is newly fetched. In this case, the throughput of the pipeline is decreased. A branch prediction table is used to reduce the effects of branch instructions on pipeline throughput.
FIG. 1 is a block diagram of a conventional pipeline including a branch prediction unit. In FIG. 1, an instruction prefetch queue 1 prefetches an instruction from a data bus 6 connected to a main memory, or an instruction memory, (not shown), thereby forming an instruction queue.
The prefetched instruction is transferred to an instruction decoder 2 that reads the instruction prefetched by the instruction prefetch queue 1 from the instruction memory and decodes the instruction. The instruction decoder transfers the address S2 of the next instruction to be decoded to a branch prediction unit 4.
Next, the instruction is transferred to an instruction execution unit 3 that executes the contents of instruction decoded by the instruction decoder 2.
A branch prediction unit 4, predicts the occurrence of a branch condition in accordance with the stored contents of a branch prediction table as described below.
A destination address generation circuit 5, generates a destination address when a branch instruction is decoded at the instruction decoder 2 and transfers it to the instruction prefetch queue 1.
The date bus 6 is connected to the main memory (not shown), instruction prefetch queue 1, and instruction execution unit 3.
This conventional data processor operates as follows:
The instruction decoder 2, while the instruction execution unit 3 is executing a first instruction, decodes a second instruction to be executed next. Accordingly, at the point of time when the instruction execution unit 3 completes execution of the first instruction, the instruction decoder 2 has already completed decoding of the second instruction. Thus, the instruction execution unit 3 can immediately execute the second instruction.
The instruction prefetch queue 1 utilizes the time when the memory (not shown) is not being accessed and prefetches the following instructions, thereby reducing the time required to fetch the following instruction.
Thus, during pipeline processing, the instruction prefetch queue 1, instruction decoder 2 and instruction execution unit 3 operates in parallel to improve the rate of throughput of the processor.
However, if a branch instruction is executed by the instruction execution unit 3 and the branch condition is established, then the target instruction at the destination address will be executed next. In this case, the instruction prefetched in the prefetch queue 1 and the decoding result of the instruction decoder 2 are canceled. At this point in time, the destination address generation circuit 5 generates the destination address of the target instruction and transfers it to the instruction prefetch queue 1. Next, the instruction prefetch queue 1 fetches the target instruction at the destination address through the data bus 6 and forms a new instruction queue.
Because the target instruction must be fetched from main memory and decoded prior to execution, a delay is introduced each time the branch condition of a branch instruction is established.
The branch prediction unit 4 reduces the delays caused by branch instructions. The unit is utilized to predict, at the decode stage, whether the branch condition of the branch instruction being decoded will be established. The branch prediction unit includes therein a branch prediction table as shown in FIG. 2 which stores a set of branch instruction addresses and associated branch prediction bits. For a given branch instruction address, if the branch prediction bit is "1" then the branch condition was established the last time the branch instruction was executed. If the branch prediction bit is "0" then the branch condition was not established.
When the address of the instruction to be decoded next by the instruction decoder 2 is transferred, as an address signal S2, to the branch prediction unit 4, the branch prediction unit 4 reads out the branch prediction bit, corresponding to the transferred address, from the branch prediction table and transfers it to the instruction decoder.
Meanwhile, upon transferring the next instruction from the instruction prefetch queue 1 to the instruction decoder 2, the instruction decoder 2 starts decoding the next instruction. As a result, when the decoded instruction is a branch instruction and the branch prediction signal S1 given from the branch prediction unit 4 predicts the occurrence of branch, the instruction fetched at that time by the instruction prefetch queue 1 is canceled. Further, the destination address generation circuit 5 generates the destination address on the basis of the decoding result as the instruction decoder and transfers it to the instruction prefetch queue 1. Hence, the instruction prefetch queue 1 fetches the branch target instruction from the main memory and gives it to the instruction decoder 2.
Accordingly, if no branch prediction unit 4 is provided, both the decoding and fetch operations previously carried out by the instruction decoder 2 and instruction prefetch queue 1 are cancelled when a branch instruction is executed. However, if the branch prediction unit 4 is provided, only the decoding operations 2 is cancelled.
If the branch prediction comes true, then the instruction to be executed next by the instruction execution unit 3 is early fetched from the main memory, and pipeline latency until the next instruction execution is reduced. Thereafter, new registration or updating of the branch prediction table is carried out.
On the contrary, when the branch prediction fails, the instruction address now under execution by the instruction execution unit 3 is given as an address signal S3 to the branch prediction unit 4, thereby carrying out the new registration updating the branch prediction table shown in FIG. 2.
In addition, the update of the branch prediction table is carried out in such a manner that, when the branch actually occurs in spite of non-prediction of branch occurrence, the branch prediction bit corresponding to the address of branch instruction is rewritten to a logical "1". If the branch instruction actually does not branch, in spite of prediction of the branch occurrence, the branch prediction bit corresponding to that address of branch instruction is rewritten to a logical "0". Also, when a branch instruction not registered in the branch prediction table is newly executed its address and branch prediction bit are registered in the branch prediction table.
The above operation of the branch prediction unit 4 restrains disturbance in the pipeline flow and improves the throughput of the apparatus.
The data processor being provided with the pipeline system carrying out the above-described branch prediction, predicts the branch occurrence in accordance with the branch predicting signal S1 given from the branch prediction unit 4 when the branch instruction is decoded at the instruction decoder 2. Hence, it is required to give to transfer the branch prediction signal to the instruction decoder prior to decoding the branch instruction at the instruction decoder 2.
However, prior to fetching the branch prediction bit from the branch prediction unit 4 the address of the next instruction must be calculated at the decode stage. For processors having variable length instructions, this calculation may not be started until the decoding of the current instruction is completed. Once the address of the next instruction is calculated the branch prediction bit for the next instruction may be fetched from the branch prediction unit 4. FIG. 3 is a timing diagram illustrating the operation of the conventional circuit.
Referring to FIG. 3, as described above, a given instruction cannot be decoded unless the branch prediction bit BPB, for that instruction has been provided to the decode stage. Accordingly, from the figure, the BPB(1) is provided to the decoder and I1 is decoded starting at T1. The time interval required to complete decoding I1 is t.sub.D. Upon completion of the decode operation, the address, A2, of the next instruction is calculated. The time interval required to complete this address computation is t.sub.A. Next, A2 is utilized to fetch the branch prediction bit, BPB(2), for the next instruction, I2. The time interval required to complete the fetch of BPB(2) is t.sub.F. Now that BPB(2) has been provided to the decode stage, the next instruction, I2, may be transferred from the prefetch stage to and the decode stage. Thus, the minimum time interval between the transfer of sequential instructions, e.g. I1 and I2, from the prefetch stage to the decode stage is the sum of t.sub.D, t.sub.A, and t.sub.F.
From the above, it is apparent that the necessity of sequentially calculating the next address and fetching the next BPB during the decode operation extends the time required to complete the decode function and slows down the rate of throughput of the pipeline.
In FIG. 4, reference numeral 1 designates an instruction prefetch queue, which prefetches an instruction from a data bus 6 connected to a main memory or instruction memory (not shown), thereby forming a queue.
Reference numeral 2 designates an instruction decoder, which reads out the instruction prefetched by the instruction prefetch queue 1 from the instruction memory and decodes it, the instruction decoder 2 giving to a branch prediction unit 4 address of the instruction now under decoding as an address signal S4.
Reference numeral 3 designates an instruction execution unit, which executes the contents of the instruction decoded by the instruction decoder 2. The instruction execution unit 3 gives the address of the instruction previously executed just before the instruction under execution now as an address signal S5 to the branch prediction unit 4 for registration.