Computer architectures consist of a fixed data path, which is controlled by a set of control words. Each control word controls parts of the data path and these parts may comprise register addresses and operation codes for arithmetic logic units (ALUs) or other functional units. Each set of instructions generates a new set of control words, usually by means of an instruction decoder which translates the binary format of the instruction into the corresponding control word, or by means of a micro store, i.e. a memory which contains the control words directly. Typically, a control word represents a RISC like operation, comprising an operation code, two operand register indices and a result register index. The operand register indices and the result register index refer to registers in a register file.
In case of a Very Large Instruction Word (VLIW) processor, multiple instructions are packaged into one long instruction, a so-called VLIW instruction. A VLIW processor uses multiple, independent functional units to execute these multiple instructions in parallel. The processor allows exploiting instruction-level parallelism in programs and thus executing more than one instruction at a time. In order for a software program to run on a VLIW processor, it must be translated into a set of VLIW instructions. The compiler attempts to minimize the time needed to execute the program by optimizing parallelism. The compiler combines instructions into a VLIW instruction under the constraint that the instructions assigned to a single VLIW instruction can be executed in parallel and under data dependency constraints. Encoding of instructions can be done in two different ways, for a data stationary VLIW processor or for a time stationary VLIW processor, respectively. In case of a data stationary VLIW processor all information related to a given pipeline of operations to be performed on a given data item is encoded in a single VLIW instruction. For time stationary VLIW processors, the information related to a pipeline of operations to be performed on a given data item is spread over multiple instructions in different VLIW instructions, thereby exposing said pipeline of the processor in the program.
In practical applications, the functional units will be active all together only rarely. Therefore, in some VLIW processors, fewer instructions are provided in each VLIW instruction than would be needed for all the functional units together. Each instruction is directed to a selected functional unit that has to be active, for example by using multiplexers. In this way it is possible to save on instruction memory size while hardly compromising performance. In this architecture, instructions are directed to different functional units in different clock cycles. The corresponding control words are issued to a respective issue slot of the VLIW issue register. Each issue slot is associated with a group of functional units. A particular control word is directed to a specific one among the functional units of the group that is associated with the particular issue slot.
The encoding of parallel instructions in a VLIW instruction leads to a severe increase of the code size. Large code size leads to an increase in program memory cost both in terms of required memory size and in terms of required memory bandwidth. In modern VLIW processors different measures are taken to reduce the code size. One important example is the compact representation of no operation (NOP) operations in a data stationary VLIW processor, for example the NOP operations can be encoded by single bits in a special header attached to the front of the VLIW instruction, resulting in a compressed VLIW instruction. Instruction bits may still be wasted in each instruction of a VLIW instruction, because some instructions can be encoded in a more compact way than others can. Differences in encoding efficiency of instructions arise, for instance, because some corresponding operations require more operands or produce more results than other operations, or when certain operations require very large immediate operands as opposed to others requiring no or small immediate operands. These differences especially arise when application domain specific tuning of a VLIW processor is desired to increase its efficiency.
Powerful custom operations can be obtained by allowing operations in the instruction set that can consume more than two operands and/or that can produce more than one result. In tuned, yet flexible processors these complex operations usually coexist with basic operations that normally consume just two operands and produce just one result. An efficient instruction encoding has to be found, such that a compact code size is obtained, without causing a large negative impact on performance, power and area because of possibly more complicated decoding hardware. EP 1.113.356 describes a VLIW processor with a fixed control word width and every instruction is encoded using the same number of bits. The processor comprises a plurality of execution units and a register file. Decoded instructions are provided to the execution units and data are provided to/from the register file.
It is a disadvantage of the prior art processor that instructions that have varying requirements with respect to the number of instructions bits they require cannot be efficiently encoded in a single VLIW instruction.