There are limits on the operational speed of any computational processing unit. Once these limits are reached, further increases in computational throughput can only be obtained through some form of parallel processing in which a plurality of computational function units operate together to execute a program. In one class of multiprocessing system, each of a plurality of processors executes one instruction from the program on each machine cycle. The set of instructions to be executed at the next instruction cycle is fed to the processor bank as a single very large instruction word (VLIW). Each processor executes part of this VLIW.
A VLIW processor requires an instruction word at each cycle that controls the set of operations, which are simultaneously issued on the function units. The simplest realization of a VLIW instruction unit is referred to as a “horizontal microcontroller”. A horizontal microcontroller defines a horizontal instruction word that is divided into separate fields for each function unit. Each function unit (FU) requires a potentially distinct number of bits of information in order to specify the operation to be executed on that FU. Each operation's format may be further subdivided into fields such as an operation code, register operand specification, literal operand specification, and other information necessary to specify an allowed operation on the FU. In general, each function unit is responsible for decoding and executing its current operation as located in a fixed position within the horizontal instruction register.
In its simplest form, the instruction register within a simple horizontal microcontroller is divided into separate, fixed-size operation fields. Each operation field provides the controlling operation for one of the FUs. Each of the operation fields must be of sufficient size to encode all operations executed by the corresponding FU. Since each FU “knows” where its part of the instruction word starts, the individual FUs need not be concerned with the remainder of the instruction word.
This simple horizontal microcontroller has a number of advantages. First, the fetch of instructions from the instruction memory into the horizontal instruction register is direct without requiring any shifting or multiplexing of instruction bits from the instruction memory. Each of the operation fields within the instruction register is wired to a single function unit and again, no shifting or multiplexing is required to properly provide an operation the corresponding function unit.
Second, instructions for horizontal microcontrollers are laid out sequentially in memory. Multiple operations within a single instruction are contiguous and are followed by the next instruction in turn.
Third, the horizontal microcontroller can be implemented in extensible VLIW configurations. Here, the instruction memory is broken into a separate instruction memory for each of the function units. Each function unit uses a separate instruction sequencer to select each operation from its instruction memory. Branches are performed by broadcasting the branch target address to all instruction sequences. Because all operation fields are of fixed size, a branch target address can be used to uniformly index into all instruction memories in order to select the appropriate next operation for all function units. An instruction cache can be readily distributed in a similar manner.
Unfortunately, horizontal microcontrollers are less than ideal. In particular, the amount of instruction memory required to represent the VLIW program is often excessive. VLIW programs frequently use NOOP operations. NOOP operations are commands, which leave the corresponding function unit idle, and may represent a substantial number of operations in the program. In principle, a NOOP could be specified using very few instruction bits, however, the horizontal microinstruction uses fixed size fields to maintain simplicity. The same number of bits is required to represent a NOOP on a given function unit as is required to represent the widest operation on that function unit. Wide operations often specify laterals, branch target addresses, and multiple input and output operands. As wider operations are defined, even operations which specify very little information uniformly, bear the high cost.
Variable width VLIW formats are designed to alleviate this problem. Both the Multiflow and Cydrome VLIW processors provide capabilities to more efficiently represent NOOPs within VLIW programs. Each of these machines uses the concept of a variable width VLIW instruction to more efficiently represent the set of operations which are executed within a single cycle. From a code size viewpoint, it is attractive to allow variable width operations on each of the function units. Variable width operations allow some operations to be represented with only a few bits while other operations are represented with a substantially larger number of bits. When operations are of variable size, it is desirable that the VLIW instruction also be of variable size in order to allow the independent specification of multiple variable-sized operations to be executed within a single cycle.
Unfortunately, the use of variable width formats to compress VLIW instruction representations leads to more complex hardware. The problem of building an instruction unit for variable width VLIW instructions can be divided conceptually into two sub-problems. The first sub-problem is that of acquiring an aligned instruction word from the instruction memory. To accommodate variable instruction width, the instruction fetch unit must acquire an instruction field that is displaced by a variable amount from the origin of the previous instruction. Each newly fetched instruction must be shifted by a variable amount depending on the size of the previous instruction. Since instructions are of variable size, instructions may also span word boundaries within a fixed word size instruction memory.
The second sub-problem is that of identifying each of the operations within the aligned instruction and transmitting them to each of the corresponding function units. The leftmost operation is considered to be aligned, because the instruction is aligned. Each subsequent operation is identified, starting at the instruction origin, by skipping over all operations to its left. However, since each of the operations is of variable width, this requires substantial shifting of fields to correctly isolate each operation. The hardware needed to overcome these problems significantly increases the cost of such variable width instruction embodiments.
Broadly, it is the object of the present invention to provide an improved variable width VLIW processor.
It is a further object of the present invention to provide a variable width VLIW processor that requires less memory and/or complex decoding hardware than prior art processors.
These and other objects of the present invention will become apparent to those skilled in the art from the following detailed description of the invention and the accompanying drawings.