1. Field of the Invention
The present invention relates generally to compilers. More particularly, the present invention relates to a trace scheduler for a compiler.
2. Description of the Related Art
Processors rely on the compiler to produce an instruction schedule which extracts and exploits the available instruction level parallelism in a routine to maximize the instruction issue rate and the parallelism of memory operations by issuing prefetches and loads as early as possible.
The process of detecting and scheduling the available instruction level parallelism is usually applied on the control flow graph of a routine, where nodes on the graph are called basic blocks. A basic block is a sequence of consecutive instructions with a single entry and a single exit.
Instruction schedulers use single basic blocks for detecting and scheduling the available instruction level parallelism. However, single basic blocks often contained insufficient instruction level parallelism. Therefore higher performance is achieved by exploiting instruction level parallelism from consecutive basic blocks.
In trace scheduling, a trace scheduler schedules instructions within a trace, which is a sequence of basic blocks having an execution frequency greater than a predetermined execution frequency. By scheduling instructions within a trace, instructions are moved between basic blocks to increase efficiency. However, due to data dependencies, certain instructions cannot be moved.
To illustrate, FIG. 1 is a control flow graph 100 including a trace 110 in accordance with the prior art. Control flow graph 100 includes basic blocks 102, 104, 106 and 108. As indicated by the arrows, the control flow of control flow graph 100 is from basic block 102 to basic block 108 through either basic block 104 or basic block 106.
Control flow graph 100 includes trace 110, which consists of basic blocks 102, 104 and 108. Trace 110 does not include basic block 106 and so basic block 106 is referred to as an off trace basic block 106.
Basic block 102 includes an instruction 112, which loads the value from memory location Mem0 into register r4. However, since a few clock cycles must pass before the value from memory location Mem0 is available in register r4 after executing instruction 112, the processor sits idle unless other instructions are scheduled immediately following instruction 112.
Accordingly, the instruction scheduler attempts to schedule instructions within basic block 102 immediately following instruction 112 to maximize efficiency. However, in this example, there are insufficient instructions within basic block 102 to schedule any additional instructions following instruction 112.
In an attempt to prevent the processor from sitting idle following instruction 112, a trace scheduler attempts to schedule instructions within trace 110, i.e., within basic blocks 102, 104 and 108, immediately following instruction 112 to maximize efficiency.
As shown in FIG. 1, basic block 108 includes instructions 114, 116, and 118. The trace scheduler attempts to move instructions 114, 116, and/or 118 to basic block 102 following instruction 112. However, instruction 114 loads the value from memory location Mem1 into register r2. Since instruction 120 in off trace basic block 106 stores the value of register r1 in memory location Mem1, instruction 114 must follow instruction 120. Otherwise, the wrong value will be loaded in register r2 in instruction 114.
Accordingly, instruction 114 cannot be moved by the trace scheduler to basic block 102. More particular, instruction 114 is not moved by the trace scheduler to prevent the wrong value from being loaded into register r2. As should be readily apparent, this decreases the efficiency of the compiler.