To fully utilize the wide machine resources in modern high-performance microprocessors, it is necessary to exploit parallelism beyond individual basic blocks. Architectural support for predicated execution increases the degree of instruction level parallelism by allowing instructions from different basic blocks to be converted to straight-line code guarded to boolean predicates. However, predicated execution also presents significant challenges to an optimizing compiler.
The use of data flow techniques is well-known in situations not involving predicated code. For example, Section 101b of the book Compilers: Principles, Techniques, and Tools by Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman, which is hereby incorporated by reference herein, beginning on page 624 describes such control techniques. The solution presented in the above-described reference, however, do not solve the problems inherent within predicated code environments.
VLIW (Very Long Instruction Word) and superscalar architectures can exploit significant amounts of instruction level parallelism (ILP) to achieve improved performance in application programs. Parallelism within individual basic blocks are generally insufficient to fully utilize wide machine resources. Predicated execution, as shown in P. Y. Hsu and E. S. Davidson, "Highly Concurrent Scalar Processing," In Proc. of the 13th Annual Int'l Symp. on Computer Architecture, pp. 386-395, June 1986; and V. Kathail, M. Schlansker, B. Rau, HPL Playdoh Architecture Specification: Version 1.0, Hewlett-Packard Laboratories Technical Report, HPL-93-80, February 1993, both of which are hereby incorporated by reference herein is an architectural model designed to exploit parallelism across basic blocks. In such a model, instructions from different basic blocks are converted to straight-live code guarded by boolean predicates. Architectural support for predicated execution varies from general support for predication in the conditional skip instructions, to conditional nullify instructions, and conditional move instructions.
In a general predicated execution model, the execution of an instruction is guarded by a boolean qualifying predicate. Each qualifying predicate can be regarded as a 1-bit predicate register. An example of a predicated instruction is: EQU x=y+z if p
where p is the qualifying predicate which controls whether the instruction executes and updates the architectural state. To explore predication, a compiler generally incorporates a technique called if-conversion, which eliminates branch instructions and converts affected instructions to appropriate predicated forms. If-conversion effectively converts control flow into data flow, as discussed by J. R. Allen, K. Kennedy, C. Portfield, and J. Warren, "Conversion of Control Dependence to Data Dependence," In Conf. Record of the 10th Annual ACM Symp. on Principles of Programming Languages, pp. 177-189, January 1983.
However, in order to achieve these benefits, predication presents a challenge to many conventional analysis and transformation phases performed in an optimizing compiler. For example, consider the code shown in (1) below: ##EQU1##
The cmpp instruction sets p true and q false if the condition (a&lt;b) is true and reverses these values if the condition is false. (The details of the architectural model will be described in the Detailed Description.) Variable x is defined and used under p and variable y is defined and used under q. A conventional live range analysis would find the live range of x from S2 to S4 and the live range of y from S3 to S5. Based on this analysis, a traditional register allocator would conclude that x and y interfere with each other and subsequently assign different physical registers to them. However, since p and q are complementary, x and y will never hold valid values at the same time and thus they can share the same physical register.
One solution for working with predicated code is discussed in a paper entitled Register Allocation for Predicated Code, presented at the 28th Annual Int'l Symp. on Microarchitecture in November of 1995, by Alexandre Eichenberger and Edward Davidson. The Eichenberger/Davidson article describes a technique which uses a method of tracking a live variable within a single hyper-block. This technique is local to a single hyper-block and is not global in nature. In addition, only the live variable problem is solved and the complex nature of any implementation in a system makes the proposed solution not very practical for commercial use.
Prior compilers used a predicate hierarchy graph (PHG) to track the boolean equations for all of the predicates in a hyperblock. This is shown in S. A. Mahike, D. C. Lin, W. Y Chen, R. E. Hank, and R. A. Bringmann, "Effective Compiler Support for Predicated Execution Using Hyperblock," In Proc. of the 25th Annual Int'l Symp. on Microarchitecture, pp. 45-54, December 1992. In Mahike, et al., predicate analysis occurs over the scope of a hyperblock and does not extend globally. When a companion data structure, like PHG, is maintained with the code, then it may require updating whenever the program is transformed.
There is proposed by N. J. Waters, S. A. Mahike, W.-M. W. Hwu, and B. R. Rau, "Reverse If-conversion," In Proc. of the SIGPLAN'93 Conf. on Programming Language Design and Implementation, pp. 290-299, June 1993, a reverse if-conversion scheme to map predicates from the data flow domain back to the more familiar control flow domain. A major drawback of this approach is that during the remapping process some originally non-existent control paths may be created. This typically occurs when code is if-converted and then scheduled, thus permuted the order of the code. When this code is reverse if-converted, non-existent paths may cause conservative treatment in many analyses and transformations.
Alexandre E. Eichenberger and Edward S. Davidson, as discussed in their paper, "Register Allocation for Predicated Code," In Proc. of the 28th Annual Int'l Symp. on Microarchitecture, November 1995, solves a register allocation problem in the presence of predication by representing predicates as P-facts--logically invariant expressions. The mechanism concludes that two live ranges do not interfere if the intersection of the two sets of P-facts can be simplified to false using a symbolic package. This approach is also restricted to the scope of a hyperblock, and may have limited practicality given the potential exponential compilation time behavior with respect to the number of predicates.
The problem of global register allocation is well known. See, for example, Preston Briggs, Register Allocation via Graph Coloring, Ph.D Thesis, TR92-183, Rice University, 1992; Gregory J. Chaitin, Register Allocation and Spilling via Graph Coloring, SIGPLAN Notices 17(6):98-105, June, 1982. Proc. of the ACM SIGPLAN'82 Symp. on Compiler Construction; and Fred C. Chow and John L. Hennessy, The Priority-Based Coloring Approach to Register Allocation, ACM Trans. on Programming Languages and Systems, 12(4):501-536, October, 1990. Using these approaches, good heuristic approximations have been developed to solve the problem.
However, predication creates new challenges for the established techniques. In the quest for greater instruction level parallelism through predication, many register live ranges are created which, with current analysis techniques, will appear to overlap. Without applying knowledge of the relationships among predicates a large number of false interferences will arise. Our experiments show the register allocation will conservatively allocate far more registers than are necessary.
For example, in (2), there is illustrated the need for predicate-aware register allocation. This code sequence arises from if-conversion and simple list scheduling. ##EQU2## As was pointed out in Section 1, without any knowledge of the relationship between p and q, these will be conservatively inferred that x and y interfere. Additionally, it is difficult to conclude that a predicated definition ends a live range.
Accordingly, one object of our invention is to provide a practical and efficient mechanism to improve data flow analysis and optimizations in the presence of predicated code.
It is a further object of our invention to provide a register allocation which benefits from understanding and utilizing the predicate relations.
It is a still further object of our invention to provide a means to construct an interference graph in the presence of predicated code.
It is a still further object of our invention to provide a means to efficiently solve the register allocation problem by limiting the scope of analysis to an arbitrary sub-region of the procedure.
It is a still further object of our invention to provide a means to infer the end of a live-range via a specialized register allocation sequence.