In an effort to improve the performance of computer processing systems, present computer processors utilize multiple functional resources for simultaneously performing multiple operations within a single processor. For purposes of the present invention, a functional resource may include any hardware resource directly available to a processor, such as registers, control elements, and, especially, arithmetic and logical functional units. For example, in U.S. Pat. No. 4,128,880, the vector processor is provided with three vector functional units and four scalar functional units. The existence of multiple functional units allows this processor to, for example, simultaneously perform two addition operations (one scalar, one vector), thereby increasing the overall performance of the processor. Multiple functional resources are most commonly associated with array or vector processors and scalar/vector processors, but may also be employed in traditional scalar processors.
Another mechanism to improve processor performance that is often associated with high performance processors having multiple functional units is pipelining. Pipelining is a processor implementation technique typically used in vector processors that increases the flow of instructions executing through the processor by simultaneously overlapping the various stages of execution of multiple instructions that each require more than one clock cycle to complete. The work to be done by each multiple-cycle instruction is broken into small pieces such that, at any given time, many instructions are in different stages of execution in the pipeline. Pipelining techniques are also implemented in methods used to feed instructions to the processor and many present art high-speed computers employ instruction pipelines. An instruction pipeline increases the flow of instructions to the processor by maintaining a full queue of waiting instructions that are ready to be fed, or issued, to the processor at the very next clock cycle that the processor can accept the instruction.
While a processor with multiple functional resources has the potential for significantly increased performance, the existence of multiple functional resources greatly increases the complexity of the execution flow of instructions in a processor. For high performance processors, the complexity of execution flow is further compounded by the use of pipelining techniques, both functional unit pipelines and instruction pipelines. For example, consider the execution flow associated with different arithmetic functions that require different amounts of time to complete, e.g., a divide instruction vs. a multiply instruction. In this situation, the different functional unit pipelines will have different latencies. When a processor has multiple functional units with different latencies, it is possible for instruction B to start executing after instruction A has started to execute and complete while instruction A is still executing. Generally, these latencies do not affect the performance of the functional unit pipeline as long as instructions are not dependent upon one another.
Unfortunately, situations known as hazards may prevent the next instruction in the instruction queue from starting during a particular clock cycle. Instruction dependency is a common hazard of this type. For example, an ADD instruction may need to use the result from a currently executing MULTIPLY instruction as an operand. Thus, the ADD instruction is dependent upon the MULTIPLY instruction and cannot execute until the result from the earlier instruction is available. To handle such hazards, the hardware in the processor uses an instruction pipeline interlock that stalls the instruction pipeline at the current instruction until the hazard no longer exists. In the example given here, the instruction pipeline interlock clears when the result of the MULTIPLY instruction is available to the dependent ADD instruction.
One of the ways to optimize the performance of a processor in response to the existence of such hazards is instruction scheduling. Instruction scheduling is a compiler or run-time technique that rearranges the execution order and functional resource allocation of the instructions compiled from a computer program so the instructions execute in the fastest and most efficient order possible. While the rearranged stream of instructions is semantically equivalent to the original stream, the instruction scheduler arranges and overlaps the execution of instructions so as to reduce overall execution time. Instructions are scheduled so as to attempt to minimize any compound effects their dependency and functional unit latency may have on pipeline performance. The existence of multiple functional resources and the parallelism between mutually independent instructions may also allow the scheduler to hide the latency of the one or more of the processor's functional units and thereby sustain pipelined instruction execution.
Presently, instruction scheduling is typically done using dynamic and/or static scheduling techniques. Some processors, like the CDC 6600 and the IBM 360/91, perform dynamic instruction scheduling as a method of functional resource allocation at execution time that uses a technique called scoreboarding. Scoreboarding is a method of allocating register space that ensures instructions are not issued unless their required register resources are available. Advancements in processor architectures of high performance processors, such as the addition of multiple arithmetic and logical functional units, require scheduling techniques beyond the capability of scoreboarding. For these processors, instruction scheduling must deal not only with functional resources such as registers, but also with the functional unit latencies and other time requirements. Consequently, prior art instruction scheduling may also use static instruction scheduling in an attempt to solve these problems.
Static instruction scheduling is a software-based technique that is done at compile time after machine instructions are generated. Typically, the compiler builds an instruction dependence graph based on the dependencies among the instructions in the instruction stream to be scheduled. Using the instruction dependence graph, the scheduler first generates a preliminary ordering of the instructions in the stream. Next, the compiler estimates the functional unit latency (the time needed for the instruction to normally execute) and the amount of time necessary to accomplish the data transfer from memory for each instruction. Based on information from these two estimates, the scheduler generates a final ordering of instructions.
In prior art instruction schedulers, such as one described by Wei-Chung Hsu in "Register Allocation and Code Scheduling for Load/Store Architectures," it is assumed that all pipeline interlocks are resolved at instruction issue time. This assumption is based on a typical model of instruction issue in which both scalar and vector instructions issue sequentially. In practice, the longer execution times of the vector instructions can clog the issue pipeline and halt the issuance of all instructions. In prior art systems, these halts adversely affect the rate of instruction execution. The architecture of the scalar/vector processor that is the subject of the previously identified co-pending application entitled SCALAR/VECTOR PROCESSOR addresses this problem by providing for a vector initiation mechanism that is separate from the instruction issue mechanism to prevent the backlog of vector instructions that are halted because of a hardware interlock.
Although prior art instruction schedulers are adequate for many pipelined processors, one of the inadequacies of present instruction schedulers is scheduling functional resources for pipelined scalar/vector processors with multiple vector functional units. Such processors have at least two functional units that perform the same set of vector arithmetic operations. In a vector processor with multiple functional units, an instruction may execute in any one of several functional units able to perform the required arithmetic operation(s). This circumstance presents new alternatives that an instruction scheduler must analyze and factor into scheduling decisions. As a result, there is a need for an instruction scheduling method that takes into account that set of information which will enable the instruction scheduler to select the optimum instruction execution path from a set of alternative paths. In addition, there is a need to provide an instruction scheduler that is also takes into account the existence of alternative models for instruction issue and initiation in a vector processor having multiple functional units.