The present invention relates generally to processors and computing systems, and more particularly to systems and methods for decreasing the execution time of instructions in explicitly parallel instruction computing (EPIC) systems that support speculation and predication.
Many practical applications require processing of very large amounts of information in a short period of time. One of the basic approaches to minimizing the time to perform such computations is to apply some sort of parallelism, so that tasks which are logically independent can be performed in parallel. This can be done, for example, by executing two or more instructions per machine cycle, i.e., by means of instruction-level parallelism. Thus, in a class of computers using superscalar processing, hardware is used to detect independent instructions and execute them in parallel, often using techniques developed in the early supercomputers.
Another approach to exploiting instruction level parallelism is used by the Very Long Instruction Word (VLIW) processor architectures in which the compiler performs most instruction scheduling and parallel dispatching at compile time, thereby reducing the operating burden at run time. By moving the scheduling tasks to the compiler, a VLIW processor avoids both the operating latency problems and the large and complex circuitry associated with on-chip instruction scheduling logic. As known, each VLIW instruction typically includes multiple independent operations for execution by the processor in a single cycle. A VLIW compiler processes these instructions according to precise conformance to the structure of the processor, including the number and type of the execution units, as well as execution unit timing and latencies. The compiler groups the operations into a wide instruction for execution in one cycle. At run time, the wide instruction is applied to the various execution units with little decoding.
Programs compiled for a VLIW processor may employ predicated and speculative computations as known in the art. To improve efficiency, certain instructions may be executed speculatively and their results may then be retired or discarded if necessary. Predicated computations can be used to represent the control flow of a source program in a more optimal way by assigning predicate values for certain instructions and by removing some branch instructions. Also, it is known that profile data that characterizes program behavior can be obtained by performing test runs of the program.
One of the goals of a compiler optimizer is to reduce the execution time of the program being optimized through better usage of the caches and by discovering and using potential instruction parallelism. Many compiler techniques exploit the full predication and speculation features of the architecture to reduce the execution time. However, applying these techniques may lead to speculative code growth, that is, the amount of executed operations may be more than needed because of useless operations executed in store. Load operations may be among such useless operations.
The use of a speculative load operation can have some negative effects if a uselessly loaded value is not in the data cache (i.e., cache miss). Examples of such negative effects will be described with reference to the following source code:
The value *p, when loaded from a memory (main memory or cache) by the operation LOAD_OP, is useful when one of the following predicates is equal to TRUE:
a)xe2x80x94(COND1==TRUE) andand (COND2==TRUE)
b)xe2x80x94(COND1==TRUE) andand (COND2==FALSE) andand (COND3==TRUE).
In other cases, the loaded value is not used and is therefore xe2x80x9cuseless,xe2x80x9d or has been xe2x80x9cuselesslyxe2x80x9d loaded. However, such useless speculative execution of LOAD_OP in the case where *p is not in the data cache may lead to at least two negative effects:
1. Consumers of the value loaded by LOAD_OP (ADD_OPs in the present example) are delayed until the memory access is complete. This will typically stall the entire CPU.
2. The value *p when speculatively loaded from the main memory may result in some useful data being removed from the data cache.
In both cases, the useless speculative execution of the load operation leads to a delay for calculations that have to be executed. It is therefore desirable to at least partially eliminate such situations so as to increase the overall execution speed of the computer system.
The present invention provides methods to partially eliminate problems associated with a cache miss for a speculative load operation when a uselessly loaded value is not in the data cache. The cache miss savings transformations of the present invention are useful for any explicitly parallel instruction computing (EPIC)-type architecture with speculation and full predication support, such as a VLIW architecture.
According to the invention, a compiler optimizer analyzes various criteria to determine whether a cache miss savings transformation is useful. Depending on the results of the analysis, the load operation and/or the successor operations to the load operation are transferred into a predicated mode of operation to enhance overall system efficiency and execution speed.
According to an aspect of the invention, a compiler optimization method is provided for preventing delays associated with a speculative load operation on a data when the data is not in the data cache of a processor. The method typically includes the steps of identifying the speculative load operation in a set of scheduled operations, wherein the set of operations includes one or more operations that are successors to the load operation, determining a first parameter defining a maximum number of operations that can be added to optimize the set of operations, and determining a second parameter defining a maximum possible critical path increase in terms of processor cycles. The method also typically includes the step of, for each successor operation, finding a nearest predicate, wherein a first value of the nearest predicate indicates that all execution paths from the successor operation will terminate without a result (idleness), and determining a first number of operations needed to obtain the nearest predicate The method also typically includes the steps of determining a predicate for:the speculative load operation by determining a disjunction of all successor operation predicates, and determining a second number of operations needed to obtain the load operation predicate, estimating the probability of speculative load operation idleness based on probabilities of predicate values in profile feedback information, checking whether the second number of operations is less than or equal to the first parameter, whether the critical path increase due to the added predicated dependence from the step of generating the load operation predicate is equal to zero or less than the second parameter, and whether the probability of speculative load operation idleness is not equal to zero. If the checking results are all true, the method also typically includes the step of transferring the load operation into a predicated mode of execution. If any of the checking results are false, and the probability of speculative load operation idleness is not equal to zero, the method typically comprises the steps of, for each successor operation, checking whether the first number of operations is less than or equal to the first parameter, and whether the critical path increase due to the added predicated dependence from the step of generating the predicates to the successor operations is less than or equal to the second parameter, and if these checking results are true, thereafter transferring each successor operation into a predicated mode of execution.
Reference to the remaining portions of the specification, including the drawings and claims, will realize other features and advantages of the present invention. Further features and advantages of the present invention, as well as the structure and operation of various embodiments of the present invention, are described in detail below with respect to the accompanying drawings. In the drawings, like reference numbers indicate identical or functionally similar elements.