Compiler optimization has as its goal transforming code to increase its performance. One important factor in optimization is scheduling operations to increase the speed of program execution by utilizing predicated and speculative operations. The present invention relates to optimizing code executed on an Explicit Parallel Instruction Computing (EPIC) architecture with full predication and speculation support and performs the global task of detecting and refining potential parallelism of the source code being compiled.
The present compiler transforms the source-code program represented as a set of Basic Blocks into Extended Scalar Blocks (ESBs) by applying a compiler technique called if-conversion which replaces conditional branches in the code with comparison instructions which set a predicate. Each predicated instruction is guarded by a Boolean source operand having a value which determines whether the instruction is executed or nullified.
Extended Scalar Blocks are regions of the predicated code where all dependencies between operations are represented explicitly as a relation between two operations for a considerable number of operations. For each ESB the compiler works out the critical path which is defined as a sequence of operations that will take the longest CPU time and cannot be executed in parallel because of dependencies.
Various types of dependencies increase the length of the critical path and therefore decrease the execution speed. Especially important are dependencies between operations with dynamic access time such as load and store operations which are affected by memory latency. Resolving problems of such possible or ambiguous dependencies is critical to execute multiple memory operations in parallel and out of order, and, accordingly, to increase resource utilization and to reduce the effects of long memory latencies.
There are a number of existing methods to perform memory disambiguation and to solve the problem of reordering memory operations. In the last decade many of them were intended to do this at run-time and also using a special memory disambiguation buffer, where run-time memory address comparison is made. These methods may be effective enough, though they require a complex compiler technique to retrieve correct execution, when conflict through memory addresses of reordered operations really occur. Examples of such techniques are described in U.S. Pat. No. 5,625,835, entitled xe2x80x9cMethod and Apparatus for Reordering Memory Operations in a Superscalar or Very Long Instruction Word Processorxe2x80x9d and U.S. Pat. No. 5,758,051, entitled xe2x80x9cMethod and Apparatus for Reordering Memory Operations in a Processor.xe2x80x9d
Advantages of the present invention over existing methods, using memory disambiguation apparatus include: 1) universal usage (less dependent on conflict probability); 2) effective usage of predicated mode of operation execution while transforming code; 3) algorithmic simplicity; 4) compensation code is not needed.
According to one aspect of the invention, a memory address comparison operation is inserted in the same block and executed in parallel with other operations of the block. Additional number of operations according to this aspect is fixed for any case (compiler driven). It is more (by one or two operations) than another method in case when no memory conflict occur, and is significantly less when it is, because of control transfer to compensation code in using disambiguation buffer. All methods, including static memory disambiguation techniques as basis, may be used and correlated in modern compilers and architectures to reach high performance.
According to one aspect of the invention, a compiler reduces the length of the critical path by removing a possible dependency between a store-load pair. The predecessor to the store operation and the load operation are scheduled in parallel and the operands output by those operations are predicated on the result of an address comparison instruction.
According to another aspect of the invention, the operand to be stored is held in a temporary register.
According to another aspect of the invention, if the address comparison instruction indicates the store and load addresses of the store and load instructions are the same, the loaded operand is nullified and the operand calculated by the store predecessor operation is input to a load successor operation.
According to another aspect of the invention, if the address comparison instruction indicates that the store and load addresses are different the loaded operand is input to the load successor operation and the operand to be stored is stored by a store instruction.
Other features and advantages of the invention will be apparent in view of the following detailed description and appended drawings.