1. Field of the Invention
The field of the invention relates to data processing apparatus and in particular to the processing of predicated instructions that is instructions whose execution is dependent upon data conditions.
2. Description of the Prior Art
Conditional instructions whose execution is dependent upon particular data conditions are known. For example, the set of instructions CMP x y, ADDGE, SUBLIT compare two values stored in locations x and y and add them together if x is greater than or equal to y and subtract them from each other if x is less than y.
Vector instructions that perform operations on multiple data elements are becoming more common. They often use masks to control which elements are processed. For example, executing an 8-element vector store instruction using the mask 10000001 would only store 2 elements (the first and last). A common optimisation when writing vectorised code that uses masks in this way is to recognize that a sequence of instructions are all controlled by the same mask and to insert a branch around all of these instructions if the mask is zero, as in this case none of the instructions would do anything. Thus, the code would become:
VCMP D0, D1, D2; compare D1 and D2 put result mask in D0VTEST D0;test if all bits in D0 are zeroBEQ L1;if mask is zero skip the next 10 operationsD0 → VOP1;perform vector operation 1 under control of maskD0D0 → VOP2;perform vector operation 2 under control of maskD0...D0 → VOP10;perform vector operation 10 under control of maskD0L1
This is very effective if the mask is often zero as then the test and branch cost only two instructions and avoid the need to perform the 10 instructions that would be performed if the branch were absent. However, a problem with this prior art approach is that the branch is data dependent in that whether it is taken or not is dependent upon two data values D1 and D2. Data dependent branches are very hard to predict and thus, if branch prediction is used to speed up the operation the branch may often be mispredicted and if it is predicted to be taken when it should not have been taken the state of the machine will need to be rolled back to the state it was in before the branch was taken. When performing vector processing saving the state of the machine at a certain point is expensive in area due to the length of the data words.
It would be desirable to be able to improve the performance of conditional instruction execution