As is well known to those skilled in the art, VLIW type processors are processors derived from RISC (reduced instruction set computer) processors which differ from conventional DSPs in that they comprise several parallel-mounted execution units. Each execution unit is the equivalent of a RISC processor core and executes instruction codes in reduced format, generally 16-bit codes, by exploiting the resources offered by a bank of registers. Since each execution unit is capable of carrying out an instruction code simultaneously with the other execution units, the VLIW processors are therefore capable of simultaneously executing a large instruction comprising several RISC equivalent codes in parallel.
To give a better understanding, FIG. 1 presents a schematic view of the standard structure of a VLIW type processor 10 whose essential elements are shown in block form. The processor 10 comprises a program memory PMEM, a data memory DMEM, an instruction register IR positioned at the output of the memory PMEM, an instruction decoder IDEC positioned at the output of the register IR, a bank of registers REGBANK designed to execute the RISC type instruction codes, execution units EU0 to EU3, as well as a circuit BMC forming the interface between the execution units UE1-EU3 and the inputs/outputs of the data memory DMEM. The execution units, which herein are four units EU0, EU1, EU2, EU3, are parallel-connected to simultaneously process four instruction codes that are read simultaneously in the memory PMEM, together forming a large instruction. The nature of the execution units may vary as a function of the application for which the processor is designed. The execution units comprise for example an ALU (arithmetic and logic unit), a MAC (multiplication/addition) unit, a CU (control unit managing the program counter PC and the connections), and a CO-PRO (coprocessor) unit to perform certain computations specific to the application.
A processor of this kind is thus capable of executing large instructions which herein includes at most four codes. At each new clock cycle H, the program counter PC of the processor is increased by an increment n which is equal to 1, except in the case of a jump or a call, and the instruction registers IR0-IR3 receive four new codes simultaneously and in parallel. These four new codes are to be executed by the units EU0-EU3.
The architecture of a processor 10 of this kind thus differs from a conventional RISC processor by its parallelism which can be found at all stages in the processing of the instructions. However, the possibilities offered by this parallelism are rarely exploited, and the compiled programs stored in the program memory PMEM generally comprise a large number of no-operation or NOP codes. Indeed, the conversion of a program written in a high-level language, for example the language C/C++, into a sequence of RISC type codes combined in bundles is done automatically by a compilation program that knows the structure of the processor and tries to form bundles of the largest possible size (with a maximum of four codes in the exemplary processor being described) to exploit the parallelism of the processor. This optimization is done by taking account of the conflicts between the codes, the availability of the execution units and the data dependence during the pipeline execution of the codes. Thus, for example, two codes designed to be executed by the same execution unit cannot be executed in parallel in the same bundle. Equally, a code using an operand that is the result of an operation that is object of another code cannot be executed so long as the code on which it depends is not itself executed.
For example, let us consider the following program sequence: which comprises instruction codes c0 to c9 comprising a parallelism bit /p or p. The instruction codes, known to those skilled in the art as syllables, are put together in bundles to form large instructions. The separation of the instructions (bundles) within a program is done by the parallelism bits p assigned to each of the codes. The two possible values /p or p of a parallelism bit, for example 0 and 1, tells whether or not the code belongs to a new instruction. More particularly, a code preceded by a parallelism bit p (for example 1) belongs to the same instruction as the previous code, while a code preceded by a parallelism bit /p (for example 0) belongs by convention to a new bundle.
In the program sequence mentioned above, the parallelism bits are thus used to distinguish four large instructions INST1 to INST4:    INST1=c0    INST2=c1 c2    INST3=c3 c4 c5 c6    INST4=c7 c8 c9
So that they can be executed by the processor 10, these instructions INST1 to INST4 are recorded in the program memory PMEM of the processor as shown in FIG. 1 and described in the following Table 1:
TABLE 1/p c0NOPNOPNOP/p c1NOPNOPNOP/p c3p c4p c5p c6/p c7p c8p c9NOP
Consequently, the compilers for standard VLIW processors generate a large number of no-operation codes for keeping certain execution units inactive while others execute codes. This may result in the program memory becoming over-burdened with 20% to 70% of the memory space being taken up, depending on the efficiency of the compiler and the matching between the program to be compiled and the resources offered by the processor. This over-burdening of the memory space by NOP codes, which is considerable in relation to the instruction programs really needed for the execution of the programs, increases the surface of the memory PMEM for a given application. This therefore causes a drop in performance (the memory is slower with the increase in size), additional consumption (the word lines and bit lines are longer in the memory array) and costs (in terms of silicon surface area).