Field of the Invention
The present invention relates to a processor with a very long instruction word (VLIW) architecture (VLIW processor).
Description of the Background Art
Various processor techniques have been developed to perform efficient arithmetic processing of large-volume data, such as image data.
For example, Patent Literature 1 (Japanese Unexamined Patent Application Publication No. 2003-216943) describes an image processor for processing graphics. The image processor includes a load-store unit, multiple operation units, and a switching channel arranged between the operation units to allow an operation result from one operation unit to enter another operation unit.
Processors known in the art mainly perform operations in 8 bits or 16 bits in image processing and image recognition. The recent trend toward more sophisticated and more complex image processing and image recognition has increased the use of operations in 32 bits in processors that perform image processing, image recognition, and other processing.
This raises the need for processors that can perform operations in 32 bits (VLIW processors) in addition to operations in 8 bits or 16 bits.
For example, a processor capable of performing operations in 32 bits in addition to 16 bits with the technique described in Patent Literature 1 may have the configuration shown in FIG. 13.
FIG. 13 is a schematic block diagram of a processor 900 capable of performing operations in 32 bits with a technique known in the art.
As shown in FIG. 13, the processor 900 includes an instruction control unit 91, a switch channel 92, an instruction execution unit 93, an instruction memory, M91, and a data memory M92. The switch channel 92 transmits data to the instruction execution unit 93 through data paths Di90, Di91, Di92, Di93, Di94, Di95, Di96 and Di97, receives data from instruction execution unit 93 through data paths through Do91, Do92, Do93, Do94, and Do95, and receives control signal Ctl91 from the instruction control unit 91. The instruction execution unit 93 receives control signal Ctl92 from the instruction control unit 91.
The instruction control unit 91 fetches an instruction from the instruction memory M91 (instruction fetching) and decodes the instruction (instruction decoding). The instruction control unit 91 controls the switch channel 92 and the instruction execution unit 93 in accordance with the result of the instruction decoding.
To execute a plurality of instructions in parallel in one cycle (one clock cycle), the instruction execution unit 93 includes a plurality of instruction slots that can perform operations in parallel in one cycle. As shown in FIG. 13, the instruction execution unit 93 includes three slots, which are a first slot 931, a second slot 932, and a third slot 933.
The first slot 931 includes a load-store unit, which loads or stores data from or into the data memory M92.
The second slot 932 includes an adder unit that performs 32-bit operations (unit indicated by Add32 in FIG. 13), an arithmetic logic unit (ALU) that performs 16-bit operations (unit indicated by Logic16 in FIG. 13), and an arithmetic shifting unit that performs 32-bit operations (unit indicated by Shift32 in FIG. 13).
The third slot 933 includes an adder unit that performs 16-bit operations (unit indicated by Add16 in FIG. 13), an ALU that performs 16-bit operations (unit indicated by Logic16 in FIG. 13), and a multiplier unit that performs 16-bit operations (unit indicated by Mul16 in FIG. 13).
The instruction memory M91 stores instructions and other information used for operations performed by the processor 900.
The data memory M92 is a storage unit that can store data used for operations performed by the processor 900.
As shown in FIG. 13, the processor 900 includes the unit that performs 32-bit operations in the second slot 932. The processor 900 thus transmits two sets of 32-bit data from the switch channel 92 to the second slot 932. For example, the adder unit Add16 needs two 32-bit data sets when performing an addition operation of 32-bit data. The processor 900 uses four paths for transferring 16-bit data (data paths Di92 to Di95) between the switch channel 92 and the second slot 932 as shown in FIG. 13. In other words, the processor 900 needs data paths corresponding to 64 bits between the switch channel 92 and the second slot 932.
When a 32-bit operation is performed in the second slot 932, the resultant output will be 32-bit data. Transmitting this output from the second slot 932 to the switch channel 92 needs data paths corresponding to 32 bits. In FIG. 13, data paths Do92 and Do93, each of which can transfer 16-bit data, are used to transmit 32-bit data from the second slot 932 to the switch channel 92.
The processor 900 includes the multiplier unit Mul16 in the third slot 933. The multiplier unit Mull 6 performs multiplication of 16-bit data and outputs 32-bit data. The processor 900 thus needs data paths corresponding to 32 bits to transmit the output result from the third slot 933 to the switch channel 92. In FIG. 13, data paths Do94 and Do95, each of which can transfer 16-bit data, are used to transmit 32-bit data from the third slot 933 to the switch channel 92.
The processor 900 using the technique known in the art to perform 32-bit operations would need more input and output ports between the switch channel 92 and the instruction execution unit 93. This increases the circuit scale. Although the switch channel 92 may be replaced by a general-purpose register file, this configuration also needs more input and output ports provided between the general-purpose register file and the instruction execution unit 93. This also increases the circuit scale.
In response to the above problems, it is an object of the present invention to provide a VLIW processor that performs efficient processing including extended bits operations, such as instructions commonly used in image processing, image recognition, and other processing, while preventing scaling up of the circuit.