The present invention relates to a processor that may be a micro processing unit (MPU) with an internal or external program memory, a digital signal processor (DSP) with an internal or external program memory or the like.
These types of processors perform pipeline processing in order to speed up processing. In pipeline processing in the prior art, an instruction queue comprising, for example, 6 stages of registers is connected to the front stage of a decoder and a queue with the same number of stages is connected to the rear stage of this decoder. Since, when the pipeline has settled into the stationary state, one normal instruction can be executed in one cycle, high-speed processing is possible.
However, with instructions that require processing different from that in normal instructions, such as branch instructions, immediate data transfer instructions or variable length instructions, the processing speed is reduced as described below.
(1) in the case of a branch instruction, since it changes the execution sequence of the instructions, instructions that have been partially processed have been discarded and it is necessary to start anew from the instruction fetch, cancelling out the benefits of the pipeline processing.
Therefore, branch prediction may be performed for the branch instruction by connecting the instruction at the branch destination in front of the branch instruction and reading it into the pipeline. However, this induces the structure of the compiler, which performs the branch prediction, complicated. Also, under certain conditions the branching will not occur, and since the instruction at the branch destination will still be executed, though it is not necessary, the processing speed is reduced.
Another approach eliminates dead time by inserting the instruction to be executed before a conditional branch instruction in rear of the conditional branch instruction as a delay slot and by executing this delay slot while the branch destination is being determined. However, this method too, induces the compiler that inserts the delay slot more complicated and also, if a delay slot cannot be inserted, the processing speed is reduced.
(2) in the case of an immediate data transfer instruction, time is required for the calculation of execution address and for memory access. This problem can be overcome and processing can be speeded up by using an immediate data transfer instruction which places the data inside an instruction word. However, since an immediate data transfer instruction must wait for the intake of the immediate data, the execution needs a plurality of cycles, thus reducing the processing speed.
(3) in the case of multiple length instructions, it is necessary to perform decoding again after the multiple lengths are compounded, thus the execution needs a plurality of cycles, reducing the processing speed.