Programmable processors are used to transform input data into output data based on program information encoded in instructions. The values of the resulting output data are dependent on the input data, the program information, and on the momentary state of the processor at any given moment in time. In traditional processors this state is composed of temporary data values stored in registers, for example, as well as so-called flags. These flags are normally used to set specific rounding modes during computation, to influence the semantics of certain operations, or to change the program flow, to name a few. Flags are normally stored in a special flags register, in which flags are rewritten after every instruction that is capable of changing one or more flags. It is usually not possible to have multiple values of the same flag alive at any given point in time inside the processor.
The ongoing demand for an increase in high performance computing has led to the introduction of several solutions in which some form of concurrent processing, i.e. parallelism, has been introduced into the processor architecture. Two main concepts have been adopted: the multithreading concept, in which several threads of a program are executed in parallel, and the Very Large Instruction Word (VLIW) concept. In case of a VLIW processor, multiple instructions are packaged into one long instruction, a so-called VLIW instruction. A VLIW processor uses multiple, independent execution units to execute these multiple instructions in parallel. The processor allows exploiting instruction-level parallelism in programs and thus executing more than one instruction at a time. Due to this form of concurrent processing, the performance of the processor is increased. In order for a software program to run on a VLIW processor, it must be translated into a set of VLIW instructions. The compiler attempts to minimize the time needed to execute the program by optimizing parallelism. The compiler combines instructions into a VLIW instruction under the constraint that the instructions assigned to a single VLIW instruction can be executed in parallel and under data dependency constraints. The encoding of parallel instructions in a VLIW instruction leads to a severe increase of the code size. Large code size leads to an increase in program memory cost both in terms of required memory size and in terms of required memory bandwidth. In modern VLIW processors different measures are taken to reduce the code size. One important example is the compact representation of no operation (NOP) operations in a data stationary VLIW processor, i.e. the NOP operations are encoded by single bits in a special header attached to the front of the VLIW instruction, resulting in a compressed VLIW instruction.
To control the operations in the data pipeline of a processor, two different mechanisms are commonly used in computer architecture: data-stationary and time-stationary encoding, as disclosed in “Embedded software in real-time signal processing systems: design technologies”, G. Goossens, J. van Praet, D. Lanneer, W. Geurts, A. Kifli, C. Liem and P. Paulin, Proceedings of the IEEE, vol. 85, no. 3, March 1997. In the case of data-stationary encoding, every instruction that is part of the processor's instruction-set controls a complete sequence of operations that have to be executed on a specific data item, as it traverses the data pipeline. Once the instruction has been fetched from program memory and decoded, the processor controller hardware will make sure that the composing operations are executed in the correct machine cycle. In the case of time-stationary coding, every instruction that is part of the processor's instruction-set controls a complete set of operations that have to be executed in a single machine cycle. These operations may be applied to several different data items traversing the data pipeline. In this case it is the responsibility of the programmer or compiler to set up and maintain the data pipeline. The resulting pipeline schedule is fully visible in the machine code program. Time-stationary encoding is often used in application-specific processors, since it saves the overhead of hardware necessary for delaying the control information present in the instructions, at the expense of larger code size. In case of a data-stationary processor, the conditional execution of operations can be implemented without the use of jump operations. However, for a conventional time-stationary processor the conditional execution of operations is not possible, without the use of jump operations. In a previous application (EP filing nr. 03101038.2 [attorney's docket: PHNL030384EPP]), a time-stationary processor is disclosed which allows the conditional execution of operations without the use of jump operations.
A disadvantage of the principle of flags and the way they are stored as well as updated, is that they cause so-called side effects in the processor, that is, behavior that is not explicitly visible in the program. Instead, side effects cause a kind of implicit behavior where the same operation in different parts of the program can exhibit different semantics, dependent on operations that have taken place earlier. Programs could be made more efficient if the updating of flags could be better controlled by the program. For example, if a branch has to take place on the zero outcome of a subtraction, a branch using the zero-flag as a condition could be used. In that case, however, no operation changing the zero-flag may be scheduled between the subtract operation and the branch operation. Since usually many operations update the flags, the subtract operation must often be scheduled just before the branch operation. These kinds of constraints severely limit the schedule freedom in programs, ruling out potentially more efficient schedules. In general, one could say that flags make it much harder to create powerful compilers for high level languages, such as the C programming language. Especially in parallel processors, like VLIW processors, flags impose an additional problem, because if multiple operations can be executed in parallel it is unclear which operation should be allowed to update the flags register. Ideally, compiler-friendly VLIW processors exhibit only a minimum number of side-effects. By removing the traditional concept of flags many of such side-effects can be eliminated. For example, special rounding modes or other special operation semantics can be implemented by using special opcodes, e.g. a special addc instruction for an addition with carry taken as a third data input next to the data inputs of a normal add instruction. In this way, flags are treated as data. However, a remaining problem is the implementation of branching that is normally handled by using flags, for example, taking the zero flag to decide on a branch-on-equal.
It is an object of the invention to enable the use of branching and looping in processors, especially in parallel processors, without the use of flags.