The central processing unit of a computer includes instruction processing circuitry for decoding and performing the function(s) specified by a computer instruction. Sophisticated instruction processing circuits often include multiple execution units that are specially adapted to process different types of instructions. Example execution units include a floating point unit, a branch unit, a memory-operation unit, and an integer arithmetic unit. Some processor architectures include multiple execution units of each type of execution unit to further enhance instruction-level parallelism.
Some architectures group instructions into bundles. For example, the IA-64 architecture from Hewlett-Packard forms bundles of three instructions. Based on dependency between instructions, as many as all the instructions in the bundle may be dispatched in parallel to respective execution units. Dependencies between instructions cause instructions to be dispatched individually.
In the IA-64 architecture, the specification of which instructions in a bundle can be dispatched in parallel and the types of execution units to which the instructions are dispatched is made in a template. The template specifies a “stop bit” which indicates that any instructions that follow the stop bit cannot be dispatched until the instruction(s) that precede the stop bit have completed. The particular template which is assigned to a bundle by a compiler, for example, depends on the types of instructions in the bundle and dependencies between instructions in the bundle.
The number of bits in the template limits the number of combinations of instruction types and stop bits that can be defined by templates. Thus, where not enough bits are available for all the possible combinations of types of execution units and stop bit positions, the architecture must support execution of all possible sequences of instructions. Even though the templates could be implemented with additional bits to increase the number of combinations of execution unit types and stop bit positions, the resulting processor architecture would not be backwards compatible with software.
To conform an instruction sequence to the available templates, no-op instructions are sometimes inserted in the sequence. A no-op instruction occupies a slot in a bundle that would ordinarily be occupied by an instruction of the type specified for the slot by the template. No-op instructions are dispatched to the execution unit specified by the template. The extra instructions unnecessarily occupy memory space and negatively impact the system's instruction cache and system performance.
A system and method that address the aforementioned problems, as well as other related problems, are therefore desirable.