Traditionally, the instructions provided within an instruction set were non-conditional, and hence if issued to an execution unit of a data processing apparatus those instructions would be executed. To provide for different flows of execution of instructions, branch instructions were provided which were conditional, such that the execution flow could hence branch to some predetermined point if the condition specified in association with that branch instruction was met.
One known type of data processing apparatus includes a pipelined processor incorporating a plurality of pipeline stages. A prefetch unit is typically provided in such a data processing apparatus to prefetch instructions for execution by the pipelined processor, in order to provide the pipelined processor with a steady stream of instructions to execute. Such prefetch units often include branch prediction logic to predict for conditional branch instructions whether the branch will be taken or not, and to prefetch instructions accordingly. However, in the event that the branch prediction proves wrong, this often requires a significant number of instructions to be flushed from the pipeline, and for new instructions to then be refetched and executed by the pipelined processor, which can have a significantly adverse effect on performance.
Another type of instruction set which has been developed is the predicated instruction set, where typically a majority of the instructions in the instruction set are conditional instructions. This enables a significant reduction in the number of branch instructions used and accordingly reduces the chance that an incorrect sequence of instructions is prefetched into the pipelined processor. For example, if a sequence of operations is considered where a comparison takes place, and then an add operation is performed if the values compared are equal, then with an instruction set that only supported conditional branch instructions, this functionality would need to be implemented by a compare instruction, followed by a branch instruction to cause a branch to another portion of the code to take place if the values compared were not equal, followed by an add instruction. However, with a predicated instruction set, the same functionality could be achieved through the use of a compare instruction followed by an add instruction which is specified as being conditional on the result of the comparison instruction indicating equality. In such an example, it can be seen that the use of a predicated instruction set improves code density, and also avoids the possibility that an incorrect sequence of instructions is issued to the execution pipeline based on an incorrect prediction of the outcome of a branch instruction.
Whilst the use of such predicated instruction sets can be beneficial, particularly in highly pipelined implementations, it can result in an increase in complexity of the design of the pipelined processing unit in order to allow correct execution of an instruction which has already been issued to the execution pipeline, and which specifies as one of its source registers a destination register of such a predicated instruction. Such an instruction will be referred to herein as a dependent instruction. By way of example, consider the following sequence of two instructions:
ADDEQ R2, R1, R3
SUB R3, R2, R4.
The ADDEQ instruction is a predicated instruction which, assuming the result of some previous comparison was equality, will execute in order to store in register R2 the sum produced by adding the contents of registers R1 and R3. The following SUB instruction is non-conditional, and is arranged to subtract the contents of register R4 from the contents of register R2, and to place the result in register R3. Since the SUB instruction requires the contents of register R2 as one of its operands, it is clearly dependent on the ADDEQ instructions that precedes it. If the ADDEQ instruction executes, then the value of R2 is given by the sum produced by the execution of the add instruction, whereas if the ADDEQ instruction does not execute (because the result of the earlier comparison was not equality), then the value of R2 required by the SUB instruction is not produced by the preceding ADDEQ instruction, but is instead the value already stored within register R2.
In order to support execution of such dependent instructions within the pipelined processing unit, it is typically necessary to provide complex forwarding paths with multiplexing logic therein that can select different source operands for the dependent instruction depending upon whether the preceding predicated instruction was executed or not. This can clearly adversely impact the potential benefits to be realised from using a predicated instruction set.
One possible way of alleviating the above problem would be to arrange the operations performed within the pipelined processing unit in order to execute the predicated instruction such that a result is always output. Hence, considering the earlier ADDEQ example, then if the earlier compare operation produced equality, the add instruction would be executed to generate as the value for register R2 the sum of the data values in registers R1 and R3, whereas if the earlier compare operation produced a result other than equality, the execution of the add instruction would merely output the existing value of R2 as the result.
However, in order to support such functionality, not only do the source registers of the predicated instruction need to be read, but also the destination register needs to be read, into the pipelined processing unit, so that the pipelined processing unit is able to produce either result as required.
The registers specified by instructions executed within the pipelined processing unit will normally reside within a register file that has a predetermined number of read ports. The provision of each read port increases the size of the data processing apparatus, and accordingly increases the cost of producing that data processing apparatus. Further, the more read ports supported, the more complex the design of the pipelined processing unit, which also increases cost. Accordingly, it is desirable to keep the number of read ports to a minimum, and hence the potential approach of also reading the destination register in addition to the source registers is likely to be considered impractical in some implementations.
Accordingly, it would be desirable to provide an improved technique for handling conditional instructions within the pipelined processing unit of a data processing apparatus.