1. Field of the Invention
This invention relates to computers and, more particularly, to a method for scheduling operations and instruction words for a very long instruction word (VLIW) microprocessor.
2. History of the Prior Art
A computer processor must receive the commands which it executes arranged in an instruction word format which the processor is capable of recognizing and utilizing. Most computer processors receive instructions for execution which have been generated from original source code by a compiler. One of the jobs of a compiler is to arrange the various instructions into proper formats and to place them in an order in which they may be rapidly executed by the particular processor. Normally, a compiler does this job statically as a part of the programming function long before a program is ready for execution. Once the compiler has finished, the code is simply run by a processor in the ultimate form furnished by the compiler.
Placing the instructions into the proper formats utilizes a process often referred to as xe2x80x9cpacking.xe2x80x9d A packing process looks at the different commands to be executed and determines which commands fit into instruction words of the various formats depending on the functional units which are available in a particular processor and the commands which each unit is able to execute. A packing process cooperates with a xe2x80x9cschedulerxe2x80x9d process which selects a sequence of instruction words including commands that will execute rapidly while meeting the constraints enforced by the packing process.
The constraints and dependencies which control scheduling execution of a program depend on both the software and the hardware. If a processor includes only one arithmetic and logic unit (ALU) and one floating point unit (FPU), then no more than one integer operation and one floating point operation can be scheduled to run at once. If a particular type of operation by one of the operating units takes some number of processor cycles and the unit is not fully pipelined, then another operation cannot be handled by that unit until the unit has completed operations already begun. And if an operation commanded by the software depends on the result of one or more earlier operations, then the earlier operations must complete before the later operation can start.
Some processors include hardware (called interlocks) which regulates those dependencies which are based upon the time required for each type of instruction to complete. Further, some processors such as reduced instruction set (RISC) processors utilize a single format for all operations. With processors which do both, scheduling requires simply arranging the packed instructions to meet the various software dependencies. Since all instructions are of the same length and the hardware takes care of timing dependencies, nothing further is required once the packing process has placed the commands into the fewest instructions possible. Scheduling for such a machine thus requires only determining a fast schedule.
However, some modern processors do not provide these features. For example, a very long instruction word (VLIW) processor includes a number of different functional units capable of processing a number of individual operations simultaneously. For example, one such processor includes two arithmetic and logic units (ALUs), a floating point unit (FPU), and a memory unit. The individual units perform their operations in parallel by responding to individual commands a number of which may be contained in a single instruction word. Typically, the commands include such functions as load, add, move, and store, each of which causes one of the many functional units to carry out the commanded operation.
In order to handle a number of operations at the same time in its different functional units, a VLIW processor must receive the commands in an instruction word arranged in a format which the VLIW processor is capable of recognizing and utilizing. One embodiment of a particular VLIW processor is capable of recognizing commands which appear in six different formats. Two of these formats each include four individual commands; while the remaining four formats each include two commands. In any of these formats, all commands occupy the same number of bits.
Because there are a number of different formats which are of different lengths, the effects of hardware and software constraints and dependencies are much more complicated in scheduling for a VLIW processor. This is especially true for processors which like the exemplary VLIW processor do not include hardware interlocks to assure that operation timing constraints for a first instruction word are met before executing succeeding commands.
With a limited number of instruction word formats, constraints on which operations may occur together, and a further requirement that instructions begin to execute only at selected intervals related to instruction word length, it is unusual for a scheduler and a packing process to be able to place commands in each available slot in all instruction words. Consequently, a scheduler for such a processor typically makes use of operations which do nothing (xe2x80x9cno-opsxe2x80x9d) to fill the unused slots so that a processor will execute the program correctly. No-ops are also used to provide correct timing for operations with longer execution latencies. One prior art solution has been to schedule instruction words in a manner to minimize the amount of execution time (i.e., provide the smallest number of instruction words since each instruction word uses about the same amount of time) and then fill the schedule with a sufficient number of no-ops to take care of instruction formats.
This has at least one deleterious consequence. The space required to store code scheduled in this manner is larger than would be necessary if the no-ops were not necessary. Since instruction caches and, to a limited extent, memory have space limits which are rapidly reached, code which occupies more space is undesirable and executes more slowly.
Moreover, one particular VLIW processor executes programs designed for other xe2x80x9ctarget processors.xe2x80x9d This VLIW processor receives its instructions in a form adapted to be executed by a target processor which typically has an entirely different instruction set than does the VLIW processor. The VLIW processor dynamically translates the stream of target instructions into instructions of its own host instruction set and stores those translated host instructions so that they may be executed without retranslation.
The translated instructions are commands representing operations that the functional units of the host VLIW processor can execute. Initially, these commands are generated in a linear sequential order and must be scheduled and packed into the long instruction words (i.e., instruction formats) recognizable by the host processor. Since the processor is dynamically translating target instructions into host instructions and executing those host instructions, the packing, scheduling, and other compiler functions take place xe2x80x9con-the-fly.xe2x80x9d This VLIW processor is described in detail in U. S. Pat. No. 5,832,205, Kelly et al issued Nov. 3, 1998, and assigned to the assignee of the present invention.
It is desirable to provide an improved process for scheduling instructions for a computer processor which is capable of recognizing a plurality of different length instruction word formats.
The present invention is realized by a method for scheduling of a sequence of operations for execution by a computer processor into a plurality of instruction word formats including the steps of arranging commands into properly formatted instruction words to provide the most rapid execution of the sequence, and arranging the operations within the plurality of instruction words to occupy the least space in memory.