Computer processors are comprised of several processing units each dedicated to processing a type of instructions. For example, FIG. 1 shows a simple processor with four processing units marked I, M, F, and B. The I unit can process instructions of the I type; the M unit can process instructions of the M type; the F unit can process instructions of the F type; and the B unit can process instructions of the B type. Each type of instructions can comprise any number of different instructions, typically between 20 and 80. The processor shown in FIG. 1 can process four instructions at one time in a single cycle of the processor, one from each instruction type, one in each of the processor units shown in FIG. 1.
A compiler is a computer program that, when run on a computer, causes the computer to receive as input the “source code” instructions of a second computer program (an “application program”) written by a programmer to perform a particular set of tasks (an “application”). To write such an application, programmers prefer to use a “high-level” language which allows them to write generalized logic instructions, leaving it up to the compiler program running on the computer to work out the details of how to instruct a particular processor to accomplish all of the steps required. The compiler program therefore translates the programmer's high-level language source code program instructions into specific detailed instructions, called “object code” or “executable code”, for execution on a processor and assembles the instructions into a particular sequence with a particular “schedule” that determines which instructions are to be executed in which cycle.
FIG. 2 shows a processor of greater complexity that includes multiple units of each unit type such as processors implementing the Intel® Itanium® architecture. If the units of each type are functionally identical, they can process any instruction of that type. That is, if unit I0 is the same as unit I1 , each of them can process any instruction of the I type. Similarly, if units M0 and M1 are functionally identical, each of them can process any instruction of the M type. The same applies for the F units and the B units as shown in FIG. 2.
The compiler can instruct the processor to execute the object code instructions one at a time, each instruction using a processor unit as appropriate to the instruction type. However, the program will run much faster if multiple instructions can be executed on the processor at the same time, each in a different unit of the processor. When there are large numbers of instruction types (seven is now typical) or large numbers of units in the processor (12 is now typical), the possibilities for scheduling the detailed instructions for the processor become quite complex. There are thousands of different ways to schedule the detailed object code instructions for even a very simple application program, and some of these possible schedules will run faster than others. Furthermore, certain schedules that would seem logically possible will create conflicts because some instructions will create dependencies for other instructions such that they must be performed in a certain order to work properly. In addition, some processors, such as those of the Itanium® class, will also require that two instructions to be processed together in a single cycle in different units be presented to the processor in a certain left-right order, that certain instructions within a group come first, and that certain instructions within a group come last. Thus, the scheduling problem for contemporary processors is exceedingly complex.
So that a compiler program running on a computer can automatically determine an appropriate schedule for the object code instructions, the compiler program has within it a computer “model” of the processor. This model is designed so that schedules of instructions that will correctly function can be determined by using the computer model of the processor. As the model considers instructions to be issued to the processor, they are identified as falling into “issue unit” types. These types are designed to generally correspond to the processor unit types. However, for implementation for a contemporary complex processor, the correspondence is not exact to improve the model's ability to manage special rules for the processor units. The model then works with issue units, each of which is a class of instructions that are functionally equivalent for the model's purposes.
A common method for modeling a processor in a compiler program use either a “reservation table” or a “finite state machine” (“FSM”, also called a “finite state automaton”) or both together. In a finite state machine, the model contains one “state” for each possible configuration of issue units for the processor. Each state fully defines the issue units that can be submitted to the processor at one time and includes an indication of each of the other states (as represented by the model) to which a transition can be effected from the present state without violating any of the state transition rules specified by the model for that processor. FIG. 3 shows a representation of a finite state machine. The binary numbers inside each circle describe the state, and arrows indicate each permissible transition, including, where permitted, a transition back to the same state.
A state machine can be represented as a state table. FIG. 4 shows a representation of a state table for a hypothetical simple processor. This processor has two unit types, I and M. For FIG. 4, there are two issue units of each type. To keep the example simple and, at the same time, typical of actual processors, this hypothetical processor can receive only three instructions at a time (even though it has 4 units) and each group of 3 instructions must be ordered such that it has an M instruction in the left-most position (“slot”). This means that each group of issue units allowed by the model will be (MMI) or (MII) or a subset of these possibilities with one or two empty slots. The left column in FIG. 4, labeled “State”, lists all eleven of the possible sets of issue units, each with a state number. The second column, labeled M/0, lists the subsequent state to which the state machine transitions if it receives an instruction of issue unit type M for slot 0 (the left-most of the three slots). The third column, labeled M/1, lists the subsequent state to which the state machine transitions if it receives an instruction of issue unit type M to be placed in slot 1 (the center of the three slots). Because an I type instruction can not go in the left-most slot, slot 0, the fourth column is labeled I/1. Similarly, the fifth column is labeled I/2.
When the state machine is in a particular state and it can receive one of two types of instructions, each of which can go in one of two types of slots, the four possibilities are represented by a row in the state table as shown in FIG. 4. If an x is shown in a column, this means the state machine can not receive such an instruction. If a number is shown, this means the state machine can receive the instruction and then transition to the state represented by the number.
If a processor has multiple units of each type as shown in FIG. 2, a state table like FIG. 4 can be used without causing undue complexity because each processor unit of a type is interchangeable with each of the other units of that type. That is, because any I type issue unit can be executed in any I processor unit and any M type issue unit can be executed in any M processor unit, simply having multiple I type units or multiple M type units does not increase complexity for the state table.
However, processors have now been developed where, within a processor unit type, one of the units can process the set of the regular instructions, like all of the other units of that type, plus it can process certain specialized supplemental instructions (a superset of instructions) that can only be processed in that particular unit and not in any other unit of that type. For purposes of this document, the additional instructions are called “supplemental” instructions. The supplemental instructions are referred to as “must-shift” instructions and information which tracks whether an instruction is a supplemental instruction is referred to as “must-shift” information. The Itanium class of processors developed by Intel Corporation is an example of a processor with multiple units of each of several types where, within a unit type, one of the units can process regular instructions plus supplemental instructions while the other units of that type can only process regular instructions.
Under the approach that has heretofore been taken for use of a finite state machine model in a compiler program, to accommodate this type of processor, one increases the number of columns in the state machine for each instruction type with supplemental instructions and greatly increases the number of states. For example, assume that, of the processor units described for FIG. 4, the 0 units (I0 and M0) are the units that can process the supplemental instructions in addition to the regular instructions and the I and M units cannot process the supplemental instructions. This means that two regular M instructions can be processed at one time (or two regular I instructions) but only one of two M instructions (or I instructions) can be a supplemental instruction.
As is known in actual processors, assume further that, whenever two instructions of a type are sent and one of them is a supplemental instruction, the processor requires that the order in a group of instructions to be executed in one cycle have the supplemental instruction left of the regular instruction. In the example of FIG. 4, this means that an M0 type instruction (issue unit) must go in the left most slot (slot 0) of an issue group, an M type instruction can go in slot 0 or slot 1, an I0 instruction can go in slots 1 or 2, and an I type instruction can go in slots 1 or 2 but never to the left of an I0 instruction. As shown in FIG. 5, this requires seven columns in the state machine, M0/0, M/0, M/1, I0/2, I0/2, I/1, and I/2. (This is one column less than a full doubling of the number of columns in the state machine of FIG. 4 because, for this hypothetical processor, M0 instructions can go only in the left-most slot.) In addition, many more states are required. FIG. 5 has nearly three times the number of states of FIG. 4, for a total size increase of more than 4 times. For the particular hypothetical processor represented by FIG. 5, many of the states are equivalent, so the number of states can be reduced, but it is still significantly larger than the state table of FIG. 4.
Large state machines are undesirable for a processor model in a compiler for today's complex processors. The present invention presents a superior solution to this problem.