The present invention relates to the field of computers and instruction code for execution on computer systems. In particular, the invention relates to computer architectures that allow the execution of instructions within the code to be predicated by a bit value.
The performance of a computer system is often a function of how well the processor manipulates and controls the flow data within the system. Over the past several decades engineers and researchers have been striving to find ways to increase the speed and throughput of instructions executed by the processor of a computer system. In more recent years, machines such as the Pentium(copyright) Pro(trademark) processor have achieved increased performance by executing instructions out of their original program order. By scheduling instructions according to the availability of machine resources, the processor was allowed to take advantage of parallelism inherent in the code.
Another technique employed by computer architects to increase processor speed and throughput is known as predication. Predication refers to the conditional execution of instructions, depending on the value of a predicate. For example, modern processor architectures allow most instructions in the code to be predicated by one-bit value, which is typically stored in an associated predication register. The predicated instruction is executed only if the predicate register has a xe2x80x9ctruexe2x80x9d value. Conversely, if the predicate register value is xe2x80x9cfalsexe2x80x9d, the instruction is ignored, i.e., it is treated essentially as being equivalent to a no-operation instruction (NOP). Another possibility is to enable predication by either (predicate) or (NOT predicate), but this method is not widely used.
Consider the sequence of instructions listed below.
cmp r1=r2xe2x86x92p2
(p2) add r3+r4xe2x86x92r5
The first instruction determines a value for the predicate (p2) based on a comparison of the operands r1 and r2. If the value of register r1 is equal to the value of register r2, then the value of predicate p2 is true. On the other hand, if the values of r1 and r2 are not equal, then p2 is false. (True and false are typically represented in the processor as single bit values xe2x80x9c1xe2x80x9d and xe2x80x9c0xe2x80x9d, respectively.)
The second instruction includes two parts. The first part (p2) predicates or conditions the second part (the addition of the contents of two registers) on the value of predicate p2. If p2 is true (e.g., xe2x80x9c1xe2x80x9d) then the value of r5 is set equal to the value of r3+r4. But if p2 is false (e.g., xe2x80x9c0xe2x80x9d) then the second part of the instruction is skipped (i.e., treated as a NOP). The processor then continues to execute the next instruction in the programmed code sequence.
Predicated instructions are very useful in that they allow the merging of two or more flows of control while avoiding conditional branches, which are expensive in terms of time consumption and resource usage if mispredicted. To realize the gains offered by predication, however, the code generator must schedule predicated instructions properly, so that instructions predicated by mutually exclusive predicates are marked independent, even if they reference the same registers. That is, it two instructions use the same register, or relate to memory, it is helpful to know whether it is possible to swap the two instructions for scheduling purposes. It, for instance, the same register is read in both instructions, this means that the two instructions can be swapped from the original program order.
In the case where a first instruction writes to a register and a second instruction reads from the same register, it means that the two instructions cannot be swapped. But if each instruction has a predicate, and the predicates are different and can never be true together, then swapping is permitted. Two instructions that are predicated by two mutually exclusive predicate registers are completely independent, regardless of any potential dependencies implied by their operands. In other words, because they are mutually exclusive, no dependencies are produced between the two instructions despite the fact that they use the same registers. It should be understood that if the two instructions were not predicated, dependencies would be produced and the instructions could not be swapped out of their original program order.
Consider the code listing shown in FIG. 1, which includes two xe2x80x98addxe2x80x99 instructions in sequence along with two xe2x80x98movxe2x80x99 instructions. The first xe2x80x98addxe2x80x99 instruction, for example, performs an addition operation in which the number 8 is added to the contents of register r15. The result is placed in destination register r14. The second xe2x80x98addxe2x80x99 instruction also affects the contents of r14. Similarly, each of the xe2x80x98movxe2x80x99 instructions references the contents of registers r16 and r17. Note that in this case, however, the two xe2x80x98addxe2x80x99 and two xe2x80x98movxe2x80x99 instructions may be safely placed in the same issue group, although they affect the same registers and would form a hazard if they did not have predicates.
When emitting, optimizing and scheduling a code hyper-block (having one entry and one or more exits) containing predicated instructions, it is very important to recognize such pairs of independent instructions. To ensure that two instructions are truly independent, not only must the two instructions be considered but also the producers of their predicates as well. Determining dependencies between instructions is a time consuming aspect of scheduling in prior art machines, and deciding whether predicates invalidate dependence slows things down even more.
Known methods for verifying independence of such pairs of instructions generally try to keep track of all predicate pairs together with the producers. One known method maintains a database of mutually exclusive predicate pairs. A drawback of this approach, however, is that the database needs to be updated for every instruction that affects at least one predicate. Such prior methods that require updating of checking of data tend to be very time consuming, therefore expensive in terms of processor performance.
Thus, there exists an unsatisfied need for an apparatus and method that is capable of detecting independent predicated instructions in a fast, efficient manner.
The present invention provides a method for detecting independent predicated instructions. In one embodiment, the method comprises associating all instructions within a block of code with true and false bit vectors. The true and false bit vectors have bit locations that correspond to instructions that produce pairs of mutually exclusive predicates. The following formula is then computed.
(I1VecTrue{circumflex over ( )}I2VecTrue)and(I1VecFalse{circumflex over ( )}I2VecFalse)!=0
In this formula, I1VecTrue and I2VecTrue represent true bit vectors respectively associated with first and second instructions in the block of code. I1VecFalse and I2VecFalse represent false bit vectors associated with the first and second instructions, respectively. In the case where the above formula produces a non-zero result, the first and second instructions are independent, regardless of the two instructions"" operands.