1. Field of the Invention
The present invention pertains to programming language compilers for computer systems. More particularly, this invention relates to a compiler for compiling a predicated code with direct analysis of the predicated code.
2. Description of the Related Art
As is known, a computer system typically includes one or more processors that are also known as central processing units (CPUs) or microprocessors. The processor typically executes instructions of software programs to perform a variety of tasks in the computer system. The instructions of the software programs are in machine language form (i.e., binary form) because the processor can only understand and interpret machine language. The machine language instructions are referred to as machine code or object code below
Because the machine language is very difficult to write and understand, high level source programming languages (such as C and Fortran) have been developed to code or define the instructions of a software program in a humanly readable fashion. Such a source programming language software program is referred to as source code. The source code needs to be converted or translated into the machine code by a compiler program before being executed by the processor.
The earlier prior art processors are typically single instruction single data (SISD) processors. A SISD processor typically receives a single instruction stream and a single data stream. The SISD processor sequentially executes each instruction, acting on data in a single storage area. This SISD processor architecture, however, presents an obstacle to achieving high processing throughput.
To increase the processing throughput of a processor, many parallel processing architectures have been developed. One type of such parallel processing models is known as an instruction-level parallel (ILP) processor. In an ILP processor, the basic unit of computation (for which scheduling and synchronization decisions are made) is a processor instruction, such as an individual add, multiply, load, or store operation. Non-interdependent instructions are loaded and executed in parallel. Using ILP processors, instruction scheduling or synchronization decisions need not be made during program execution. Some decisions can be made during program compilation. For example, if the compiler can prove that two operations are independent (i.e., neither requires the result of the other as input), the operations can be executed in parallel.
However, frequent and unpredictable branch operations in a program code may present a major barrier to exploiting a greater amount of instruction-level parallelism. This is because some branch operations typically introduce branch latencies or mispredict penalties, thus causing the execution to stall at run-time. In addition, branch operations typically limit the scheduling scope of the code with respect to the instruction-level parallelism. The branch operations are referred to as branches.
In order to eliminate branches and further enhance the instruction-level parallelism, a new architectural model has been proposed in which each processor operation is guarded by a boolean-valued source operand. The value of the operand determines whether the operation is executed or nullified. This architectural model is referred to as predicated execution and the boolean-valued source operand is referred to as predicate. From the viewpoint of the instruction set architecture, the main features of the predicated execution are a predicate guarding each operation and a set of compare-to-predicate operations used to compute predicates. The predicated execution typically eliminates many branches completely and generalizes the rules for moving code among basic blocks. It is quite often that an entire acyclic control flow subgraph can be converted into a single, branch-free block of code.
The process of replacing branches with appropriate predicate computations and guards is referred to as if-conversion or predicate conversion. The resulting code from the if-conversion is referred to as predicated code. FIG. 1A shows the conventional code for a program represented in FIG. 1B. FIG. 1C shows the predicated code converted from the conventional code of FIG. 1A. As can be seen from FIG. 1C, explicit branches that control execution are replaced with guarding predicates on operations, together with compare-to-predicate operations that compute the appropriate predicate values. All non-branch instructions are predicated with the predicates during if-conversion. The result is a single, branch-free block of predicated code.
However, problems occur when a predicated code is compiled by a conventional compiler. This is because the data flow analysis tools of conventional compilers typically do not exploit relations between predicates when compiling the predicated code. FIG. 2 shows a conventional compiler 50 for compiling the predicated code. As can be seen from FIG. 2, the compiler 50 includes an if-conversion system 51, a scheduler and register allocator 52, and a data flow analysis system 53. The data flow analysis system 53, however, only analyzes the data dependency of the original code and does not incorporate information about relations between predicates of the predicated code into its data flow analysis. This typically causes the data flow analysis system 53 to either make incorrect assumptions about the run-time behavior of the predicated code or to make no assumption, which then yields overly-conservative results in crucial areas such as scheduling and register allocation. FIG. 3 shows a conservative schedule of the code in FIG. 1C, generated by the compiler of FIG. 2. FIG. 3 illustrates the drawback of using the conventional approach to compiling the predicated code.