1. Field of the Invention
This invention relates to a processor for processing a plurality of instructions in parallel, and in particular to a processor for processing an instruction set of a plurality of instructions packed into a single code.
2. Description of the Background Art
In recent years, with the spread of portable terminal devices, the digital signal processing for processing a great amount of data such as voices and images at high speed has become increasingly important. A DSP (digital signal processor) is typically used as a semiconductor device exclusive to such digital signal processing. However, in the case where an amount of data to be processed is enormous, it is difficult to improve the performance dramatically even with the use of an exclusive DSP. Assuming that ten thousand sets of data are to be arithmetically processed, for example, at least ten thousand cycles are required even if the operation on each set of data can be executed in a single machine cycle. In other words, each set of data may be processed at high speed, but the time required for processing increases in proportion to the amount of data because the data processing is in series.
In the case where an amount of data to be processed is large, the processing performance can be improved by parallel operation. Specifically, a plurality of operation units are prepared and operated at the same time to process a plurality of sets of data at the same time. In the case where the same operation is performed on a plurality of sets of data, the method called SIMD (single instruction-multiple data streams) can be employed to reduce the area of the operation unit while maintaining a high parallel performance. Specifically, while a plurality of data processors are prepared, a high performance with a small area can be exhibited by providing a common control unit for interpreting an instruction and controlling the process.
Document 1 (D. A. Patterson and J. L. Hennessy, “Computer Organization and Design”, Nikkei Business Publications) describes a method of decreasing the length of an instruction code to reduce the size of the instruction memory.
Document 2 (Akira Nakamori, “Introduction to Microprocessor Architecture”, CQ Publishing), on the other hand, describes a method in which a plurality of slots are formed in one instruction format for parallel execution by VLIW (very long instruction word) in order to increase the number of instructions that can be executed per cycle.
However, reducing the size of the instruction memory as described in Document 1 and increasing the number of instructions that can be executed per cycle as described in Document 2 are in the relation of so-called tradeoff.
Specifically, according to the method described in Document 1, it is possible to reduce the size of the instruction memory, while the fact that a plurality of instructions are processed in series poses a problem that a number of cycles are required to execute the instructions. Taking an example where the instruction code length is 16 bits, the instruction length is short but four cycles are required to execute four instructions.
According to the method described in Document 2, on the other hand, more instructions may be executed per cycle, while the instruction length is increased and so is the size of the instruction memory. In the case where four slots of 16 bits are provided, for example, four instructions can be executed at the same time in a single cycle at the sacrifice of an extended instruction code length of 64 bits.