To fully utilize the wide machine resources in modern high-performance microprocessors, it is necessary to exploit parallelism beyond individual basic blocks. Architectural support for predicated execution increases the degree of instruction level parallelism by allowing instructions from different basic blocks to be converted to straight-line code guarded by boolean predicates. However, predicated execution also presents a significant challenge to an optimizing compiler.
As will be discussed, we will assume the architectural support of a general predicated execution model provided in the HPL Playdoh architecture as shown in V. Kathail, M. Schlansker, B. Rau, HPL Playdoh Architecture Specification: Version 1.0, Hewlett-Packard Laboratories Technical Report, HPL-93-80, February 1993, in which the execution of an instruction can be guarded by a qualifying predicate. The following form of compare instructions is provided to set predicates. EQU p1,p2=cmpp.&lt;d1&gt;&lt;d2&gt;(a rel b) if qp
Predicates p1 and p2 are two destination predicates. Each of &lt;d1&gt; and &lt;d2&gt; is a two-letter descriptor that specifies a type and mode for the compare instruction. There are four comparison types: unconditional(u), conditional(c), parallel-or(o), and parallel-and(a), specified by the first letter of the descriptor. Each type has both a normal mode (n) and a complement mode (c), specified by the second letter of the descriptor. Descriptors &lt;d1&gt; and &lt;d2&gt; control destination predicates p1 and p2, respectively. The condition (a rel b) is in the form of a logical comparison of two variables a and b, where rel can be eq, ne, It, etc. Predicate qp is the qualifying predicate.
To simplify the discussion for the unconditional and conditional types, we assume they are always generated in form of cmpp.un.uc and cmpp.cn.cc, respectively. Therefore, for a comparison with the unconditional type, the values of p1 and p2 are complementary. If the comparison type is and(or) and qp is true, the (on) target predicate (say p1) is equal to the result of (a rel b) "and"ed ("or"ed) with the old value of p1.
The following table is the summary on how the destination predicates are being set in these compare instructions. In the table, T and F stand for true and false values, respectively. X means "don't care" and nc means that the result is unchanged. We also assume that there is a special predicate register p0, where any read from it is always true and any write to it is discarded.
______________________________________ qp (a rel b) un uc cn cc on oc an ac ______________________________________ F X F F nc nc nc nc nc nc T T T F T F T nc nc F T F F T F T nc T F nc ______________________________________
The code fragment shown as (1) below shows the importance of recognizing predicate relations in a global scope. Both of x and y are defined and used in different basic blocks. To check for interferences, one would propagate the use of x under p to the else-clause towards the definition of x under r. While crossing the definition of y under s during this process, an interference will be assumed between x and y unless one can assert that p and s are disjoint. By examining the code, since p and s are defined in the then- and else- clauses, respectively, p and s can never be both true at the same time. However, it requires global analysis to systematically take control flow into account to assert the disjointness between p and s. This relation cannot be captured by a hyperblock or basic block based analysis. Further, with a global predicate analysis, one can also assert that p and r are disjoint. Therefore, the definition of x under r can never reach the use of x under p, and this definition is dead.
______________________________________ (1) p,q,r,s = false if ( .. ) then { p,q = cmpp.un.uc if true x = .. if p } else { r,s = cmpp.un.uc (. . .) if true x = .. if r y = .. if s .. = x if p .. = y if s ______________________________________
Prior compilers used a predicate hierarchy graph (PHG) to track the boolean equations for all of the predicates in a hyperblock. This is shown in S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann, "Effective Compiler Support for Predicated Execution Using Hyperblock," In Proc. of the 25th Annual Int'l Symp. on Microarchitecture, pp. 45-54, December 1992. In the Mahlke, et al. work, predicate analysis occurs over the scope of a hyperblock and does not extend globally. When a companion data structure, like PHG, is maintained with the code, then it may require updating whenever the program is transformed.
There is proposed by N. J. Waters, S. A. Mahlke, W.-M. W. Hwu, and B. R. Rau, "Reverse If-conversion," In Proc. of the SIGPLAN'93 Conf. on Programming Language Design and Implementation, pp. 290-299, June 1993, a reverse if-conversion scheme to map predicates from the data flow domain back to the more familiar control flow domain. A major drawback of this approach is that during the remapping process some originally non-existent control paths may be created. This typically occurs when code is if-converted and then scheduled, thus permuted the order of the code. When this code is reverse if-converted, non-existent paths may cause conservative treatment in many analyses and transformations.
Alexandre E. Eichenberger and Edward S. Davidson, as discussed in their paper, "Register Allocation for Predicated Code," In Proc. of the 28th Annual Int'l Symp. on Microarchitecture, November 1995, solves a register allocation problem in the presence of predication by representing predicates as P-facts--logically invariant expressions. The mechanism concludes that two live ranges do not interfere if the intersection of the two sets of P-facts can be simplified to false using a symbolic package. This approach is also restricted to the scope of a hyperblock, and may have limited practicality given the potential exponential compilation time behavior with respect to the number of predicates.
A need therefore exists in the art for a predicated execution system to take into account the semantics of the predicates during compilation analyses, such as data flow analysis, without adding any artificial or unwanted steps in the process.