The present invention relates generally to compiler techniques, and more particularly to techniques for determining a lookahead value for an instruction.
Parallel computer architectures generally provide multiple processors that can each be executing different tasks simultaneously. One such parallel computer architecture is referred to as a multithreaded architecture (MTA). The MTA supports not only multiple processors but also multiple streams executing simultaneously in each processor. The processors of an MTA computer are interconnected via an interconnection network. Each processor can communicate with every other processor through the interconnection network. FIG. 1 provides a high-level overview of an MTA computer. Each processor 101 is connected to the interconnection network and memory 102. Each processor contains a complete set of registers 101a for each stream. In addition, each processor also supports multiple protection domains 101b so that multiple user programs can be executing simultaneously within that processor.
Each MTA processor can execute multiple threads of execution simultaneously. Each thread of execution executes on one of the 128 streams supported by an MTA processor. Every clock time period, the processor selects a stream that is ready to execute and allows it to issue its next instruction. Instruction interpretation is pipelined by the processor, the network, and the memory. Thus, a new instruction from a different stream may be issued in each time period without interfering with other instructions that are in the pipeline. When an instruction finishes, the stream to which it belongs becomes ready to execute the next instruction. Each instruction may contain up to three operations (i.e., a memory reference operation, an arithmetic operation, and a control operation) that are executed simultaneously.
The state of a stream includes one 64-bit Stream Status Word (xe2x80x9cSSWxe2x80x9d), 32 64-bit General Registers (xe2x80x9cR0-R31xe2x80x9d), and eight 32-bit Target Registers (xe2x80x9cT0-T7xe2x80x9d). Each MTA processor has 128 sets of SSWs, of general registers, and of target registers. Thus, the state of each stream is immediately accessible by the processor without the need to reload registers when an instruction of a stream is to be executed.
The MTA uses program addresses that are 32 bits long. The lower half of an SSW contains the program counter (xe2x80x9cPCxe2x80x9d) for the stream. The upper half of the SSW contains various mode flags (e.g., floating point rounding, lookahead disable), a trap disable mask (e.g., data alignment and floating point overflow), and the four most recently generated condition codes. The 32 general registers are available for general-purpose computations. Register R0 is special, however, in that it always contains a 0. The defining of register R0 has no effect on its contents. The instruction set of the MTA processor uses the eight target registers as branch targets. However, most control transfer operations only use the low 32 bits to determine a new program counter.
The instructions of an MTA are 64 bits long and contain a lookahead field, a memory operation, an arithmetic operation, and a control operation. The memory operation encodes an access to memory, the arithmetic operation encodes a computation to be performed on the values in the registers, and the control operation encodes a control transfer operation that may be conditional. An instruction may contain any combination of these operations. An MTA processor executes the operations of an instruction in parallel. In general, the executing instruction within a stream cannot begin (i.e., the instruction cannot be issued) until execution of the previous instruction completes. For example, if the previous instruction loaded a value from memory into a register and the next instruction reads that register, then the load must be complete before the register is read. Since arithmetic and control operations operate only on registers and not on main memory, an MTA processor can be executing multiple instructions that contain only these operations in parallel. Because of memory latency time, the execution of a memory operation may take as many as 70 clock time periods. Therefore, the next instruction after an instruction that contains a memory operation cannot be issued until the memory operation completes unless the processor knows that the next instruction does not xe2x80x9cdependxe2x80x9d on the results of memory operation.
Some processor architectures have been designed to inspect instructions to determine whether they have any dependencies on instructions whose execution is not yet complete. If there are no dependencies, then the instructions can be executed in parallel. The MTA provides the lookahead field of an instruction as an alternative to the inspection of instructions to determine dependencies. A programmer can set the lookahead field in each instruction to indicate a number of following instructions of the stream that are not dependent on that instruction. The MTA processor will not begin execution of more instructions in parallel with the current instruction than the lookahead number of instructions. The lookahead value of an instruction needs to take into consideration all possible paths of execution following that instruction. For example, if the next instruction contains a conditional branch, then the lookahead value can be set to be the minimum number of instructions to a dependent instruction whether or not the branch is taken. The MTA supports a 3-bit lookahead field in which lookahead values can range from zero to seven. A lookahead value of zero means that execution of the current instruction must complete before execution of the next instruction begins. If an instruction contains no memory operation, then the next instruction cannot be dependent on it. Therefore, the lookahead values in such instructions can be ignored by an MTA processor.
The dependency of one instruction upon another is more formally defined as follows. An instruction J depends on a memory operation of a previous instruction I if any operation in instruction J accesses a register that is modified by the memory operation of instruction I or if the memory operation in instruction J accesses the same memory as instruction I and either access modifies the memory.
The MTA supports designating a conditional branch operation as branching either xe2x80x9coftenxe2x80x9d or xe2x80x9cseldom.xe2x80x9d A conditional branch operation specifies a target location to which a processor transfers control if the condition is satisfied. If the condition is not satisfied, the processor transfers control to the next instruction by incrementing the program counter to point to the next location. A programmer may designate a conditional branch as xe2x80x9coftenxe2x80x9d if it is anticipated that the branch will in general be taken. Otherwise, the programmer or code generator may designate the conditional branch as xe2x80x9cseldom.xe2x80x9d If a conditional branch that is designated as xe2x80x9coftenxe2x80x9d is not taken or a conditional branch that is designated as xe2x80x9cseldomxe2x80x9d is taken, then the MTA processor waits until all instructions previously issued for the stream complete before issuing the next instruction.
Table 1 contains assembly language instruictions that illustrate the lookahead field and the often/seldom designation for a conditional branch. The syntax of an instructions is
(inst 1a (M-op) (A-op) (C-op))
where xe2x80x9c1axe2x80x9d represents the lookahead value, xe2x80x9cM-opxe2x80x9d represents a memory operation, xe2x80x9cA-opxe2x80x9d represents an arithmetic operation, and xe2x80x9cC-opxe2x80x9d represents a control operation. Instruction 1 has a lookahead value of 3 and a memory operation that indicates to load register r6 with the value from the memory location pointed to by the contents of register r3 plus 16 (i.e., r6=*(r3+16)). Instruction 2 has a lookahead value of 6, a memory operation of r2=*(r3+8), and an arithmetic operation of r0=r4xe2x88x92r5 that sets the condition code. Instructions 3, 4, and 5 have no memory operation so their lookahead value is ignored by the processor. Instruction 3 has a conditional branch that branches to the location pointed to by register t3 if the condition code indicates equality. Instruction 4 has a conditional branch that is designated as often and that branches to the location pointed to by register t2 if the condition code indicates less than. Instruction 5 has an arithmetic operation of r2=r6+r7.
The lookahead value of 3 in instruction 1 indicates instructions 2, 3, and 4 and the instruction pointed to by register t3 do not depend on instruction 1. Therefore, execution of these instructions can begin before execution of instruction 1 completes. The lookahead value of 6 in instruction 2 indicates that instructions 3 and 4, five instructions following target t3, and four instructions following target t2 do not depend on instruction 2. Although instruction 5 depends on instruction 2 (i.e., instruction 2 stores a value in register r2 and instruction 5 also stores a value in register r2), the conditional branch of instruction 4 is designated as often which means that all instructions in the process of being executed must complete before instruction 5 can be issued.
It would be desirable to have a technique for automatically calculating lookahead values for instructions and for designating conditional branch operations as either often or seldom to maximize the parallel execution of instructions of a stream.
The appendix contains the xe2x80x9cPrinciples of Operationxe2x80x9d of the MTA, which provides a more detailed description of the MTA.
Embodiments of the present invention provide a computer-based method and system for determining designations for conditional branch operations and settings for lookahead values for a portion of a computer program. The lookahead system of the present invention evaluates various combinations of designations for the conditional branch operations for the portion of the computer program. The lookahead system generates a metric to measure the amount of parallel processing that would result from each combination of designations assuming that the lookahead values are set to optimal values for that combination. This metric may take into consideration estimated or actual execution frequencies of the instructions. The lookahead system then designates the conditional branch operations and sets the lookahead values based on the metric generated for one of the combinations.
In one embodiment, the lookahead system designates a conditional branch operation of a branch instruction by reviewing paths of execution that include the branch instruction. For a path of execution that starts at a start instruction and includes the branch instruction that includes the conditional branch operation, the lookahead system calculates a number of instructions along that path of execution that do not depend on the start instruction. The lookahead system then designates the conditional branch operation as often or seldom based on the calculated number.
In another embodiment, the lookahead system calculates an instruction lookahead value for a start instruction without altering the designation of any conditional branches. For paths of execution that start at the start instruction, the lookahead system calculates a path lookahead value for the path of execution as the number of instructions along that path starting at the start instruction that do not depend on the start instruction. The lookahead system then sets the instruction lookahead value of the start instruction to the minimum of the calculated path lookahead values.