Such processors include a CPU having multiple processing units for executing multiple instructions in parallel. VLIW instruction bundles may include multiple elementary instructions targeting the different processing units of the CPU. Thus an instruction bundle for such a processor may reach a length typically between 64 and 128 bits, or even 256 bits or more.
In some VLIW processors, the distribution of elementary instructions of an instruction bundle between the CPU processing units is always performed in the same order. FIG. 1 shows an input circuit INC1 of processing units PU1, PU2, PU3, PU4 of such a processor. The circuit INC1 includes registers R01-R04 and R11-R14 designed for receiving elementary instructions of an instruction bundle. The registers R0j, R1j are connected to the processing unit PUj, where j=1, 2, 3 and 4. VLIW instruction bundles are then divided into lanes, each lane being assigned to a processing unit PUj. The compiler is then configured to assign to a particular lane in each instruction bundle it generates, only elementary instructions that can be processed by the processing unit corresponding to that lane. The instruction bundles generated by the compiler then systematically present the following structure: P1-P2-P3-P4, Pj being an elementary instruction exclusively executable by the processing unit PUj (j=1, 2, 3, or 4). If the program to be executed by the processor includes consecutive elementary instructions requiring a same processing unit, it may be necessary to allocate these elementary instructions to different VLIW instruction bundles. If the other lanes of these bundles cannot receive elementary instructions requiring other processing units, these lanes usually receive elementary NOP instructions that do not trigger any processing. Moreover, in the case of elementary instructions of variable size, the compiler completes the lanes receiving elementary instructions smaller than the maximum size with one or more elementary NOP instructions. The term “syllable” designates a fixed-size word that composes an elementary instruction of variable size. In the example of FIG. 1, the elementary instructions include one or two syllables of same size. Thus the registers R01-R04 are configured for receiving the first syllable of the elementary instructions of an instruction bundle, and the registers R11-R14 are configured for receiving the possible second syllable of the elementary instructions. For each elementary instruction having no second syllable, the register R1j paired with the register R0j receiving the first syllable of the elementary instruction is padded with a single syllable NOP instruction.
It follows that a substantial portion of the memory containing the program may be occupied by NOP instructions. It also follows that a substantial portion of the data flow on the processor instruction bus contains such instruction bundles. These drawbacks lead to poor use of CPU processing resources, and power consumption. It also follows that a significant portion of the program memory is occupied unnecessarily by NOP instructions.
Some processors are designed to not constrain allocations of VLIW bundle instructions to processing units. With such a configuration, it is not necessary to insert NOP instructions in the instruction bundles. To this end, each instruction input of each processing unit comprises a multiplexer whose inputs are connected to input registers receiving the elementary instructions of the VLIW bundle. FIG. 2 shows an input circuit INC2 of processing units PU1-PU4, in the case where the elementary instructions in the VLIW instruction bundles have up to two syllables. The input circuit INC2 includes registers R01-R04 and R11-R14, and two multiplexers MX01-MX04 and MX11-MX14 for each processing unit PU1-PU4. The VLIW bundles generated by the compiler of such a processor include at most four elementary instructions, each having one or two syllables of same size. Each first multiplexer MX01-04 provides the first syllable of an elementary instruction to a processing unit PU1-PU4 to which it is connected, and each second multiplexer MX11-MX14 provides the second syllable of an elementary instruction to the processing unit PU1-PU4 to which it is connected. Since the register R01 cannot receive a second elementary instruction syllable, it is only connected to the first multiplexers MX01-MX04. Therefore, each first multiplexer MX01-MX04 has eight inputs, and each second multiplexer MX11-MX14 has seven inputs, for four processing units PU1-PU4. The multiplexers MX01-MX04 and MX11-MX14 may therefore be controlled by three-bit control words. This results in an interconnection circuitry between the registers R01-R04 and R11-R14 and multiplexers MX01-MX04 and MX14-M11, and a multiplexer control circuit having a relatively high complexity (60 multiplexer inputs). In addition, the propagation time of a signal in a multiplexer increases with the number of inputs of the multiplexer. The presence of the multiplexers MX01 to MX04 in particular, may require processing of the instruction bundle to be delayed by a clock cycle.
It is possible to significantly reduce the number of multiplexer inputs by ordering the elementary instructions in the VLIW instruction bundle according to ranks assigned to the processing units and by completing the shorter elementary instructions (shorter than the maximum number of syllables) with NOP instructions, so that all the elementary instructions in the instruction bundle, eventually completed, have the same length, i.e. the same number of syllables.
Thus, FIG. 3 shows an input circuit INC3 of processing units PU1-PU4. The input circuit INC3 includes registers R01-R04 and R11-R14, and multiplexers MX22-MX24 and MX32-MX34. The registers R01 and R11 are connected directly to the first and second inputs of the processing unit PU1. The multiplexer MX22 is connected by its inputs to the registers R01 and R02, and by its output to the first input of the processing unit PU2. The multiplexer MX32 is connected by its inputs to the registers R11 and R12, and by its output to the second input of the processing unit PU2. The multiplexer MX23 is connected by its inputs to the registers R01, R02 and R03, and by its output to the first input of the processing unit PU3. The multiplexer MX33 is connected by its inputs to the registers R11, R12 and R13, and by its output to the second input of the processing unit PU3. The multiplexer MX24 is connected by its inputs to registers R01-R04 and by its output to the first input of the processing unit PU4. The multiplexer MX34 is connected by its inputs to registers R11-R14, and by its output to the second input of the processing unit PU4. This solution limits to 18 the number of inputs of the multiplexers MX22-MX24, MX32-MX34. However, this solution only partially contributes to improving the use of processor resources and the use of program memory, since NOP instructions are still added in the instruction bundle to the elementary instructions not having the maximum number of syllables.
The need to insert NOP instructions in the VLIW instruction bundles can be avoided by maintaining an order of elementary instructions in the instruction bundle, corresponding to the order assigned to the processing units. Thus, FIG. 4 shows an input circuit INC4 of processing units PU1-PU4. The input circuit INC4 includes registers R01-R04 and R11-R14, and multiplexers MX42-MX44 and MX52-MX54. The registers R01 and R11 are connected directly to the first and second inputs of the processing unit PU1. The multiplexer MX42 is connected by its inputs to the registers R02, R11 and R02, and by its output to the first input of the processing unit PU2. The multiplexer MX52 is connected by its inputs to the registers R11, R02 and R12, and by its output to the second input of the processing unit PU2. The multiplexer MX43 is connected by its inputs to the registers R01, R11, R02, R12 and R03, and by its output to the first input of the processing unit PU3. The multiplexer MX53 is connected by its inputs to the registers R11, R02, R12, R03 and R13, and by its output to the second input of the processing unit PU3. The multiplexer MX44 is connected by its inputs to the registers R01-R04 and R11-R13, and by its output to the first input of the processing unit PU4. The multiplexer MX54 is connected by its inputs to the registers R02-R04 and R11-R14, and by its output to the second input of the processing unit PU4. This solution limits to 30 the number of inputs of the multiplexers MX42-MX44, MX52-MX54, without having to insert NOP instructions in the VLIW instruction bundles. However, this solution also requires multiplexers with a large number of inputs. In particular, the multiplexer MX44 may need the processing of instruction bundles to delayed by one clock cycle.
It is desirable to simplify the input circuit of the processing units of a VLIW processor without penalizing the use of CPU processing resources and the use of program memory, and without increasing power consumption. It is also desirable to improve the efficiency of this input circuit for transmitting the elementary instructions of an instruction bundle to the various processing units.