Usually, the processing of one instruction requires a number of different steps to be gone through. Moreover, the execution must go through the steps in sequence. In a pipeline processor comprising a number of stages, different hardware is responsible for the different tasks as assigned by the stages. When instructions are executed in succession, the pipeline is filled from clock-cycle to clock-cycle. The performance is considerably increased by the pipeline technique because of the much higher clock frequency if the instructions follow sequentially. However, if jump instructions occur when another instruction is executed, the flow through the pipeline is interrupted because the address of the following instruction that is to be processed can only be ascertained during the decoding phase. However, in many known systems and in many applications, the jump instructions are quite common and represent a relatively high percentage of the number of instructions, in some cases, e.g. about 20%. The increase in performance strived at through applying the pipeline technique can then be severely affected.
In U.S. Pat. No. 4,974,155 a pipeline computer is illustrated. To avoid the loss of time that occurs when the process is blocked due to jump instructions, variable delay branch operations are applied which accommodates programming of conditional as well as unconditional branch operations. When program instructions are provided for execution, steps are implemented to accommodate a branch command followed by a split command to indicate the time for the jump. A split command may for example comprise a bit in an instruction command in the execution of a branch. This means that a jump is not executed directly after the branch command but it is delayed through a number of program instructions pending a split command in the form of a split bit. The number is variable and can be controlled by the programmer and the delay must be at least one cycle. When a conditional branch is specified by a control instruction, the results of subsequent processing are tested until the condition codes are ready and a split bit occurs. If the condition is not met, no jump is taken but execution is resumed of the current sequence in the program counter. Thus, in the case of a branch command, the system provides for a variable delay during which branch or target instructions are fetched and when a split bit occurs, a jump can be done promptly.
However, conditional jumps are not handled in a satisfactory manner since the losses in time due to the delay etc. are considerable and there is still a risk of pipeline break when there are several substantially consecutive jump instructions and moreover, the arrangement is based on branching and preconditions.
Moreover, the arrangement as disclosed in the above-mentioned document, apart from not being sufficiently efficient in saving time, is complicated.
U.S. Pat. No. 5,287,467 describes an arrangement, which uses branch instruction detection from the execution pipelines to enhance the parallelism of multi-pipelined computers. Up to two of the detected instructions are processed concurrently parallel with the operation of the executions pipelines. The branch condition processing consists of detection of a branch instruction in the current instruction stream, predicting whether or not the branch is taken, prefetching the instruction text for a branch which is predicted as being taken, performing the branch test inherent in the branch instruction and issuing a corrected instruction refetch, if necessary. Then the branch instruction location information is used to generate the address of the branch instruction in a cache memory. Then a branch target address is generated, and the branch instruction text, its address and a prefetch indicator are entered into a branch queue, which is operated as a FIFO queue.
This arrangement mainly considers a Scalable Compound Instruction-Set Machines (SCISM) using branch prediction mechanism. If the prediction outcome of a branch prediction is that the branch is not taken, instruction fetching proceeds with the normal fashion. Moreover, entire branch instruction, including the branch address is stored in the FIFO, which requires larger memory units. In case of a misprediction, the correct instruction must again be fetched into the decode stage of the execution pipeline. To obtain an optimum branch prediction and thereby an optimum arrangement, a complicated logic unit is required, which complicates the arrangement.
A known processor comprising a three-stage pipeline comprises the stages of fetching instructions, decoding instructions and executing or carrying out instructions. Once the pipeline is filled there will be a number of instructions in different steps and there is an output of the finished instruction every cycle in case the instructions take, e.g. one micro-cycle if not in a pipelined system. If the speed of the processor for example would be increased or even doubled, the output would also be doubled. However, in practice the pipeline is broken as soon as the instruction flow is non-sequential, i.e. if there is a jump in the instruction code. Therefore, in the end of the pipe there are two pipelines, in order to hide the effects resulting from jumps in the instruction code, to a micro-control unit or a micro-processor unit. One pipeline follows the program flow sequentially and the other pipeline is started as soon as a conditional jump is found in the instruction queue. The second pipeline, i.e. the non-sequential pipeline, assumes that the detected jump will be taken and therefore fetches and decodes the instructions from the jump address and onwards.
If however there were one more conditional jump in the sequential queue, the prefetching of instructions would have to be stopped and even if the jumps were not taken, it would be impossible to keep the sequential pipeline filled. Thus there is a stop or a pipeline break when the second conditional jump is found resulting in a loss of time.