This invention relates generally to computer software. More specifically, it relates to a method of scheduling instructions for efficient execution on a particular processor.
Generally, computer programmers write computer code in high-level programming languages to simplify various programming tasks. Compilers translate the high-level programs into a sequence of machine-readable instructions. The machine-readable instructions are collectively known as an instruction trace. The instruction trace is typically directed toward a particular processor. In the past, compilers generated the instructions for the instruction trace in the same order that the programmer specified them in the high-level program.
To improve the speed and efficiency of the processors, some modern processors have multiple pipelined execution units. Each pipelined execution unit has one or more stages, each stage performing a specific function that can be completed in a single clock cycle. The pipelined execution unit receives instructions at a first stage (i.e., stage one) and the instruction passes from stage one through each stage of the pipeline. At the end of the pipeline, execution of the instruction is complete. By this method, the efficiency of the processor is increased, because an instruction can be fed into the pipelined execution unit on each cycle, rather than waiting until the previous instruction is complete.
Pipelining is most efficient when the pipeline is kept full. If execution of an instruction is not begun on a particular clock cycle, the execution unit stalls. When an execution unit stalls, the efficiency of the processor goes down, since the pipelined execution unit has resources that are available, but not being used.
Execution unit stalls sometimes occur because of data dependancies. That is, an instruction may be dependent on the results of an instruction that has not yet completed. Modern compilers attempt to reduce execution unit stalls by executing instructions out of sequence. That is, instructions that are ready to be executed are placed in front of instructions that are not yet ready.
Another way that processor performance is increased is by speculative execution. Sometimes, the order of execution is not known until runtime. For example, many branch instructions are dependent on the results of previous calculations. The hardware makes predictions on how the branch instruction will be resolved and executes instructions speculatively based on the prediction. If the prediction was correct, the processor is ahead of where it would have been had it waited for the branch to be resolved. If it is not correct, then the system reverts back to where it would have been without the speculative execution.
Code Motion (also referred to as trace rescheduling) is one method used in optimizing programs for execution. A compiler reorders the instructions to decrease execution unit stalls. However, a limitation of currently available systems in executing instructions out-of-order is the compiler has limited knowledge of the effect of moving instructions ahead in the sequence. Sometimes, executing instruction speculatively is counterproductive since they cause additional overhead. For example, if instructions are moved ahead of a branch instruction and executed, and the prediction turns out to be wrong, the result is that unnecessary work was done and must be undone.
As users put more and more demands on gaining the most efficient use of their processors, it is important to find ways of compiling software for efficient execution by avoiding pipeline stalls. Consequently, there is a need for new and better ways of compiling instructions to a processor to allow for efficient operation.
The present invention provides a method of improving compiler use of code motion. The method uses a superscaler processor simulator to reorder instructions according to criteria established by the user. It generates statistics showing the effectiveness of particular reordering criteria. A user or compiler may use the statistics to determine the best reordering technique for a particular processor and software.
The method simulates a processor running a program and determines which instructions cause the processor to stall due to unavailability of resources or operands. It moves up (xe2x80x9choistsxe2x80x9d) execution of other instructions that are not stalled, so that they may begin execution during the processor stall. Barrier instructions are determined above which the instructions are not hoisted. Barrier instructions include branch instructions, store instruction (if load past store is disallowed), and instructions which will cause the number of registers needed to exceed a predetermined number. By not hoisting instructions above the barrier instructions, the method finds an efficient ordering of the instructions.
To easily correlate the reordered instruction trace to the source code, paths are identified in a unique and easily identifiable way. The paths are ranked according to different criteria such as the number of hoisted instructions or the number of path encounters. This produces useful examples of how paths can and should be optimized by a compilers code generator.