Computer architectures consist of a fixed data path, which is controlled by a set of control words. Each control word controls parts of the data path and these parts may comprise register addresses and operation codes for arithmetic logic units (ALUs) or other functional units. Each set of instructions generates a new set of control words, usually by means of an instruction decoder which translates the binary format of the instruction into the corresponding control word, or by means of a micro store, i.e. a memory which contains the control words directly. Typically, a control word represents a RISC like operation, comprising an operation code, two operand register indices and a result register index. The operand register indices and the result register index refer to registers in a register file.
In case of a Very Large Instruction Word (VLIW) processor, multiple instructions are packaged into one long instruction, a so-called VLIW instruction. A VLIW processor uses multiple, independent functional units to execute these multiple instructions in parallel. The processor allows exploiting instruction-level parallelism in programs and thus executing more than one instruction at a time. In order for a software program to run on a VLIW processor, it must be translated into a set of VLIW instructions. The compiler attempts to minimize the time needed to execute the program by optimizing parallelism. The compiler combines instructions into a VLIW instruction under the constraint that the instructions assigned to a single VLIW instruction can be executed in parallel and under data dependency constraints. Encoding of instructions can be done in-two different ways, for a data stationary VLIW processor or for a time stationary VLIW processor, respectively. In case of a data stationary VLIW processor all information related to a given pipeline of operations to be performed on a given data item is encoded in a single VLIW instruction. For time stationary VLIW processors, the information related to a pipeline of operations to be performed on a given data item is spread over multiple instructions in different VLIW instructions, thereby exposing said pipeline of the processor in the program.
In practical applications, the functional units will be active all together only rarely. Therefore, in some VLIW processors, fewer instructions are provided in each VLIW instruction than would be needed for all the functional units together. Each instruction is directed to a selected functional unit that has to be active, for example by using multiplexers. In this way it is possible to save on instruction memory size while hardly compromising performance. In this architecture, instructions are directed to different functional units in different clock cycles. The corresponding control words are issued to a respective issue slot of the VLIW issue register. Each issue slot is associated with a group of functional units. A particular control word is directed to a specific one among the functional units of the group that is associated with the particular issue slot.
The encoding of parallel instructions in a VLIW instruction leads to a severe increase of the code size. Large code size leads to an increase in program memory cost both in terms of required memory size and in terms of required memory bandwidth. In modem VLIW processors different measures are taken to reduce the code size. One important example is the compact representation of no operation (NOP) operations in a data stationary VLIW processor, i.e. the NOP operations are encoded by single bits in a special header attached to the front of the VLIW instruction, resulting in a compressed VLIW instruction.
Instruction bits may still be wasted in each instruction of a VLIW instruction, because some instructions can be encoded in a more compact way than others can. Differences in encoding efficiency of instructions arise, for instance, because certain operations require very large immediate values as operands, as opposed to others requiring no immediate values or small immediate values. Instructions requiring very large immediate values are commonly used for initialization of register values. Especially in processors with a large datapath width, typically larger than 16 bits, it can be very expensive to initialize registers using a single instruction. Encoding the immediate value alone already requires as many bits as the datapath width, and additional bits for encoding of the operation code and register index are required as well. In case different instructions also have to be encoded for the same issue slot and these instructions require fewer bits, a very inefficient instruction encoding is obtained for this particular issue slot. This is, for instance, the case in VLIW architectures with a fixed control word width, since in combination with a varying instruction width the decoding process becomes less efficient. U.S. Pat. No. 5,745,722 describes an apparatus for executing a program which contains immediate data and a program conversion method for generating an instruction which the apparatus can carry out. The program conversion method is used for encoding immediate data at the time of converting a program into a desired program format, thereby reducing the size of an instruction code. The program conversion method is mainly used by a compiler. When executing the resulting program, instructions, including instructions having immediate data, are sequentially fetched and decoded so that an execution section carries out the fetched instruction. When the instruction decoder detects that the instruction code contains immediate data, the immediate data is transmitted to a data decoder. When the immediate data is encoded, the data decoder decodes the data according to a given rule, thereby generating decoded immediate data The decoded immediate data is then transmitted to an execution unit to be processed. When the supplied immediate data is not encoded the data decoder sends the data intact to the execution unit.
It is a disadvantage of the prior art processing apparatus that during decoding of instructions an additional step is required for decoding of encoded immediate data