Processors can be general-purpose processors or application-specific instruction-set processors. They can be used for manipulating different types of information, including sound, images and video. In case of application specific instruction-set processors, the processor architecture and instruction set is customized, which reduces the system's cost and power dissipation significantly. Processor architectures usually consist of a fixed data path, which is controlled by a set of control words. Each control word controls parts of the data path and these parts may comprise register addresses and operation codes for arithmetic logic units (ALUs) or other functional units. Each set of instructions generates a new set of control words, usually by means of an instruction decoder which translates the binary format of the instruction into the corresponding control word, or by means of a micro store, i.e. a memory which contains the control words directly. Typically, a control word represents a RISC like operation, comprising an operation code, two operand register indices and a result register index. The operand register indices and the result register index refer to registers in a register file.
In case of a Very Large Instruction Word (VLIW) processor, multiple instructions are packaged into one long instruction, a so-called VLIW instruction. A VLIW processor uses multiple, independent functional units to execute these multiple instructions in parallel. The processor allows exploiting instruction-level parallelism in programs and thus executing more than one instruction at a time. Due to this form of concurrent processing, the performance of the processor is increased. In order for a software program to run on a VLIW processor, it must be translated into a set of VLIW instructions. The compiler attempts to minimize the time needed to execute the program by optimizing parallelism. The compiler combines instructions into a VLIW instruction under the constraint that the instructions assigned to a single VLIW instruction can be executed in parallel and under data dependency constraints. In case no meaningful processing can take place in certain clock cycles for one or more functional units, a so-called no-operation (NOP) instruction is encoded in the VLIW instruction for that particular functional unit. In order to reduce the code size, and thus saving costs in terms of required memory size and in terms of required memory bandwidth, a compact representation of no-operation (NOP) instructions in a data stationary VLIW processor may be used, e.g. the NOP operations are encoded by single bits in a special header attached to the front of the VLIW instruction, resulting in a compressed VLIW instruction
To control the operations in the data pipeline of a processor, two different mechanisms are commonly used in computer architecture: data-stationary and time-stationary encoding, as disclosed in “Embedded software in real-time signal processing systems: design technologies”, G. Goossens, J. van Praet, D. Lanneer, W. Geurts, A. Kifli, C. Liem and P. Paulin, Proceedings of the IEEE, vol. 85, no. 3, March 1997. In the case of data-stationary encoding, every instruction that is part of the processor's instruction-set controls a complete sequence of operations that have to be executed on a specific data item, as it traverses the data pipeline. Once the instruction has been fetched from program memory and decoded, the processor controller hardware will make sure that the composing operations are executed in the correct machine cycle. In the case of time-stationary coding, every instruction that is part of the processor's instruction-set controls a complete set of operations that have to be executed in a single machine cycle. These operations may be applied to several different data items traversing the data pipeline. In this case it is the responsibility of the programmer or compiler to set up and maintain the data pipeline. The resulting pipeline schedule is fully visible in the machine code program. Time-stationary encoding is often used in application-specific processors, since it saves the overhead of hardware necessary for delaying the control information present in the instructions, at the expense of larger code size.
EP1.113.356 describes a VLIW processor having a plurality of functional units and a register file. Decoded instructions are provided to the functional units, and input data are provided from the register file and result data are written to the register file.
It is a disadvantage of the prior art processor that in case it is determined at run-time that result data is invalid, i.e. the result data do not have to be written back to the register file, a communication path from a functional unit to the register file still has to be enabled, as it is statically, i.e. at compile time, not known whether this result data will be valid or not.