The modem optimizing compiler for the Explicit Parallel Instruction Computing (EPIC) architecture performs the global task of detection and refining the potential parallelism of a source program being compiled. The EPIC architecture creates new possibilities for code optimization due to its ability to perform many operations in parallel.
Improvements in both programming techniques and hardware support to improve compiler optimization techniques are important to realizing the full potential of the EPIC architecture.
A Critical Path Algorithm can be utilized to schedule a basic block of operations. A critical path is defined as the longest path through a data dependency graph. A latest start time list is generated forming a number of time frames equal to the length of the critical path. Each operation is examined to determine whether it is dependent on another operation. For example, if an operation is not dependent on any other operation it can be scheduled last.
The EPIC approach is platform oriented and therefore code translation mechanisms are very important and include many platform dependent optimizations. Critical path algorithms facilitate the utilization of the full parallelism inherent in a target machine. However, the critical path strategy is not useful for scheduling when potential parallelism of a basic block being scheduled is much higher than the parallelism of the target hardware because the target hardware may have insufficient register resources required to hold data until dependencies and latencies are resolved.
One of the possible approaches to register allocation for processors with multiple functional units is described in U.S. Pat. No. 5,987,259 Patent titled xe2x80x9cFunctional unit switching for the allocation of registersxe2x80x9d. If some instruction is able to be performed onto a variety of possible functional units, it is scheduled at the unit with the least subscribed register resources. This method is useful for an architecture with separate register files connected to each functional unit. In case of two equivalent units and if there are insufficient registers associated with the first unit to keep all the temporary values necessary for some code region, instructions of that code region scheduled to perform on the first functional unit that may be executed on the second unit are detected. Those instructions are rescheduled for execution on the second functional unit. This method is not applicable for architectures with register files shared among a number of functional units.
Another approach to the problem of optimal register allocation is expounded in the paper xe2x80x9cCombining Register Allocation and Instruction Schedulingxe2x80x9d by Rajeev Motwani, Krishna V. Palem, Vivek Sarkar and Salem Reyen (Technical Report, Courant Institute, TR 698, July 1995). The main idea of that method is the formulation of combined register allocation and instruction scheduling within a basic block as a single optimization problem. An objective cost function that more directly captures the primary measure of interest in code optimization is the completion time of the last instruction. Unfortunately, a heuristic algorithm designed could not have been formulated in the context of an existing instruction scheduling/register allocation phase-ordered solution.
One more paper should be mentioned in context of the invention proposed: xe2x80x9cScheduling Expression DAGs for Minimal Register Needxe2x80x9d by Christoph W. Kessler (University of Trier, Germany, faculty of Mathematics and Informatics, Monthly report 1996-12). This article presents a new algorithm for generating schedules for expression Data Dependency Graph (DAG) that uses a minimal number of registers. Classically it is a NP-complete problem, but the algorithm shows good results for expressions up to 50 operations. Extended to cope with multiple functional units this algorithm might be applied for EPIC architecture unless there is a requirement of minimal register usage. This may penalize performance due to restricted Instruction Level Parallelism (ILP).
Accordingly, improved techniques for scheduling operations in a basic block are required. The main requirements to be satisfied are: multiple functional units, shared register file, sensitivity to register pressure and operation latencies, compatibility with traditional phase-ordered compilation technique (separate operation scheduling and register allocation), unrestricted ILP.
Platform specific optimization methods for the EPIC architecture include full predication and speculation support. Optimizations based on the estimation of earliest and latest start times of operations and on a critical path strategy are applied to Extended Scalar Blocks (ESBs, also referred to as Basic Blocks). Extended Scalar Blocks are regions of the predicated code where all dependencies between operations are represented explicitly as a relation between two operations for a considerable number of operations. For each ESB the compiler works out the critical path which is defined as a sequence of operations that will take the longest CPU time and cannot be executed in parallel because of dependencies. The main goal of any optimizer in optimizing the compiler is to reduce the execution time of the program being optimized. Reduction of the critical path estimated in terms of target processor cycles is the application criterion applied to optimizations to achieve the main goal.
An optimizing compiler for an Explicit Parallel Instruction Computing (EPIC) architecture performs the global task of detecting and refining the potential parallelism of a source program being compiled.
According to one aspect of the invention, a heuristic for a List Scheduling algorithm schedules registers based on register economy which is helpful when scheduling scalar blocks of great potential parallelism. The critical path in this case is much longer than theoretical value. The register pressure is reduced at the scheduling phase.
According to another aspect of the invention, scheduling priority for a cycle-driven scalar scheduler is changed from operation of Latest Start Time to a function of two arguments: the number of successors of a particular operation and an order of operation in a tree-like IR (intermediate representation) structure.
According to another aspect of the invention, the use of new heuristic for well-known List Scheduling algorithm leads to smaller register pressure.
According to another aspect of the invention, an algorithm for calculation of register economy priority values (REP) includes the act of ordering operations of a basic block in a data dependency graph, indicating the number of predecessors for each ordered operation, and generating sequential REPs starting with the last operation in the ordered list.
Other features and advantages of the invention will be apparent in view of the following detailed description and appended drawings.