1. Field of the Invention
The present invention relates to a signal processor having a pipeline circuit and a method thereof.
2. Description of the Related Art
A reduced instruction set computer (RISC) processor mounted in a digital signal processor (DSP), etc. generally performs signal processing in accordance with programs, as explained below. Namely, a processor performs signal processing for each instruction in programs by successively carrying out the following instruction stages (steps): an instruction fetch stage (IF stage) for fetching instructions from an instruction memory, an instruction decoding stage (ID stage) for decoding the fetched instructions, an execution stage (EX stage) for executing the decoded instructions, a memory access stage (MEM stage) for accessing a memory, and a write stage (WB stage) for writing results obtained by the access in the memory.
In this case, when adjusting the timing for fetching instructions to the timing after the end of the WB stage for the previous instruction, a time of double the total time spent for each of the IF stage, the ID stage, the EX stage, the MEM stage, and the WB stage is required from the time of start of fetching the previous instruction to the time of the end of the WB stage for the next instruction.
FIG. 1 is a block diagram of a computer processor 1 of the related art.
As shown in FIG. 1, the processor 1 comprises an IF module 2, a register 3, an ID module 4, a register 5, an EX module 6, a register 7, an MEM module 8, a register 9, a WB module 10, and a controller 11.
The IF module 2, the ID module 4, the EX module 6, the MEM module 8, and the WB module 10 respectively execute the IF stage, the ID stage, the EX stage, the MEM stage, and the WB stage.
Here, in the processor 1, in order to increase the amount of processing per unit time, pipeline processing which performs above-mentioned processing for the different stages in parallel has been conventionally adopted.
In pipeline processing, as shown in FIG. 2, processing of all of the stages is finished within one cycle, instructions are successively input to the processor for every cycle, and the different instructions of the IF stage, the ID stage, the EX stage, the MEM stage, and the WB stage are executed in parallel.
Specifically, in the processor 1 shown in FIG. 1, instructions "n" to "n+4" are input to the processor 1 at one cycle intervals. At the cycle 20, the WB stage for the instruction "n", the MEM stage for the instruction n+1, the EX stage for the instruction n+2, the ID stage for the instruction n+3 and the IF stage for the instruction n+4 are performed in parallel.
In this way, when using five-step pipeline processing, the amount of processing per cycle can be increased by five times compared with the case without pipeline processing.
While the above mentioned processor 1 was explained with reference to the example of use of five-step pipeline processing, it is also possible to further divide the processing of instructions to simplify the processing in each stage so as to raise the clock frequency and increase the amount of processing per unit time.
As explained above, in the processor 1, as shown in FIG. 2, when starting the EX stage for the instruction "n", the ID stage for the instruction n+1 and the IF stage for the instruction n+2 start.
When the instruction "n" is a branch instruction, whether the instruction "n" is a branch instruction is recognized in the ID stage. Whether or not to branch, however, that is, whether the branching condition is met or not, is decided only when the instruction "n" at the EX stage is processed. Accordingly, when the instruction "n" is determined to be a branch instruction, the instructions n+1 and n+2 which follow the instruction "n" are already fetched.
At this time, if the instructions n+1 and n+2 continue flowing into the pipeline processing, instructions for non-branch destinations (instructions placed immediately after a branch instruction) end up being executed and correct execution is not possible.
To avoid this, for example, as shown in FIG. 3, when an instruction is determined to be a branch instruction in the EX stage, the following instructions n+1 and n+2 which are already fetched are aborted and the instructions "m" and m+1 at the branch destination of the next cycle are successively fetched.
However, aborting already fetched instructions has the disadvantage of reducing the processing efficiency. For instance, in the case shown in FIG. 3, the branching results in a two-cycle delay.
In order to overcome this, use is made of the "delayed branch" technique of arranging instructions following branch instructions so that instructions which are always executed regardless of the decision of the existence of a branch instruction are positioned immediately after the branch instruction and instructions that depend on whether there is a branch instruction are delayed in execution. Here, the group of instructions which are executed regardless of a branch among instructions which follow a branch instruction is called a "delay slot".
When using the above explained delayed branch technique, if the number of instructions in a delay slot is larger than the number of instructions which could be aborted after being fetched because of a branch, it is possible to place the delay slot immediately after the branch instruction. If this is not the case, it is necessary to place a "nop" (no operation) instruction instructing the system to do nothing immediately after the branch instruction. Accordingly, there is the disadvantage that the processing efficiency declines.
There are also other methods such as stopping the pipeline when recognizing a branch instruction in the ID stage, fetching an instruction of a branch destination or non-branch destination only after the branch decision, and then restarting the pipeline.
Whichever method is used, however, it is impossible to specify the instruction to fetch next before executing the branch instruction (branch decision), therefore the pipeline is stopped until specifying which instruction to fetch and the processing efficiency declines.
Accordingly, a processor 1 using pipeline processing has a "branch penalty" caused by the branch instructions. It is important to reduce this penalty for better efficiency.
In order to reduce this branch penalty as much as possible, there is the method of predicting a branch beforehand. However, this can result in a large penalty if the prediction proves false. Also, mounting a prediction circuit has the disadvantage of increasing the size of the processor.
Another method is to make the branch decision in the ID stage and performing the branching immediately. However, if the data covered by the decision is being processed by an instruction before the branch instruction (in the EX stage), a critical path occurs and high speed mounting becomes difficult.