Instruction scheduling is a process of rearranging or transforming program statements before execution by a processor in order to reduce possible run-time delays between compiled instructions. Instruction scheduling is usually performed at an intermediate language or assembly code level. Such transformations must preserve data dependences and are subject to other constraints. This can be particularly advantageous when compiling for pipelined machine architectures, which allow increased throughput by overlapping instruction execution. For example, if there is a delay of one cycle between fetching and using a value V, it would be desirable to "cover" this delay with an instruction that is independent of V and is "ready" to be executed.
A particular application for instruction scheduling is in the field of so-called Reduced Instruction Set Computers (RISC). An introduction to RISC computers can be found in Reference 1.
The RISC approach to building high speed processors, which emerged in the late seventies, emphasises the need for the streamlining of program instructions. As a result, instructions have to be rearranged, usually at the intermediate language or assembly code level, to take full advantage of pipelining and thereby to improve performance. The burden of the instruction scheduling, is placed on optimising compilers that generate code for RISC processors. Usually, the compilers perform instruction scheduling at the basic block level, solving most of the problems posed by the pipelined structure of RISC processors.
Approaches to scheduling at the instruction level for pipelined machines are described in a number of articles (see References 2, 3, and 4).
Whereas for machines with n functional units the idea is to be able to execute as many as n instructions each cycle, for pipelined machines the goal is to issue a new instruction every cycle, effectively eliminating the so-called NOPs (No OPerations). However, for both types of machines, the common feature required from the compiler is to discover in the code instructions that are data independent, allowing the generation of code that better utilises the machine resources.
It was a common view that such data independent instructions can be found within basic blocks, and that there is no need to move instructions beyond basic block boundaries. A basic block is a sequence of consecutive instructions for which the flow of control enters at the beginning of the sequence and exits at the end thereof without a wait or branch possibility, except at the point of exit. Virtually, all of the previous work on the implementation of instruction scheduling concentrated on scheduling within basic blocks (see References 2, 3 and 4 above).
Even for basic RISC architectures, however, such a restricted type of scheduling may result in code with many NOPs for a large family of programs including many UNIX-type (UNIX is a trademark of UNIX System Laboratories Inc.) programs that include many small basic blocks terminated in unpredictable branches. For scientific programs, where basic blocks tend to be larger, these problems tend not to be so severe.
Recently, a new type of architecture is evolving that extends RISC by the ability to issue more than one instruction per cycle, (see Reference 5).
This type of high speed processor organisation, called superscalar or superpipelined architecture, poses more serious challenges to compilers, since instruction scheduling at the basic block level is not sufficient to allow generation of code that utilises machine resources to a desired extent (see Reference 6).
One recent effort to pursue instruction scheduling for superscalar machines was reported in Reference 7. In this article, code replication techniques for scheduling beyond the scope of basic blocks were investigated, resulting in considerable improvements of running time of the compiled code. Recently different approaches for moving instructions beyond basic block boundaries have been presented (see References 8 and 9). However, the effect of moving instructions beyond block boundaries is limited unless the instructions can be scheduled speculatively. Speculative scheduling means that instructions are executed ahead of time before a preceding conditional branch is performed. Therefore sometimes the results of such speculatively scheduled instructions are not used in the subsequent execution of the program.
An important class of speculative instructions is speculative load instructions. This is because usually a computational sequence starts with loading operands from the memory into registers. However, if load instructions are scheduled speculatively an undesired exception may be caused in program execution, due to an access to a non-existent or protected memory location.
The object of the present invention is to provide for speculative scheduling of load instructions without such program exceptions being caused.
Therefore, according to a first aspect of the present invention there is provided an instruction scheduler for rescheduling an input instruction sequence to form an output instruction sequence to run on a computer, the instruction scheduler being capable of speculatively scheduling load instructions by moving certain categories of load instructions from a source block of instructions in the input instruction sequence to a target block of instructions to form the output instruction sequence, the instruction scheduler comprising:
logic for selecting a data-independent load instruction as a candidate for rescheduling; PA1 logic for determining whether the base register that the load instruction makes use of and/or the contents thereof meets any one of a number of conditions; PA1 logic for moving the selected load instruction from the source block to the target block in response to determination that any one of the conditions is met.
The safeness of rescheduling a candidate load instruction can be determined by classifying load instructions into a number of categories based on whether the base register the instruction makes use of and/or its contents satisfy any one of a number of conditions.
One of the conditions can be that the base register contains a pointer to a memory area where addresses of global variables are stored. In general, it can be assumed that speculative rescheduling of load instructions whose base register contains a pointer to a memory area where addresses of global variables are stored will not cause a program exception.
Another of the conditions can be that both the base register of the selected load instruction be the same register as is being used as the base register of another load instruction within the target block and that there does not exist any instruction that changes the contents of this register in the path between the target block and the source block. In this case the instruction scheduler must further comprise means for allocating an extra dummy page, which can be defined in read only mode, in front of and after each data segment of a program. It can be assumed that if executing a similar, but non-speculative, load in the target block does not give rise to a program exception, then executing the speculative load instruction will not give rise to one either, provided that the contents of the register is not changed in program execution between the target block and the source block. The extra dummy page is required to avoid the possibility that a speculative load has a larger displacement than its non-speculative counterpart and thus may cross data-segment boundaries and enter into a non-existent or protected region. Advantageously, these extra pages can be defined in read-only mode, so no program exception will be caused by executing a load from the memory locations belonging to them.
A third condition can be that both the contents of the base register of the load instruction equals zero and that the contents of the base register not be changed in the path between the target block and the source block. In this case page zero can be defined, by the operating system of the computer on which the rescheduled code is to run, in read only mode, and a finite number of bytes of page zero, for example the first 64, can be filled with zeros.
The invention further enables a compiler to be provided for compiling code to run on a computer, the compiler including an instruction scheduler in accordance with the invention. Also a computer in provided including such an instruction scheduler. In an advantageous form of the invention the computer has a superscalar and/or superpipelined architecture.
Viewed from another aspect the invention provides a method of rescheduling an input program instruction sequence, the input instruction sequence being partitioned into blocks of instructions, the method comprising: (a) selecting a data-independent load instruction from a source block of instructions as a candidate instruction; (b) determining whether the base register the load instruction makes use of or the contents thereof satisfies any one of a number of conditions; and if so, (c) rescheduling the input program instruction sequence so that the candidate instruction is moved to a target block of instructions.