1. Field of the Invention
The present invention relates to a processor that divides data to be operated into a plurality of parts and sequentially executes operations for the plurality of parts, and in particular, to a processor capable of inputting/outputting data during execution of operations.
2. Description of the Background Art
In recent years, digital signal processing in which a large amount of data such as audio data or image data are processed at high speeds has become increasingly important with the spread of portable terminal equipment. Generally, for such digital signal processing, a DSP (Digital Signal Processor) is often used as a dedicated semiconductor device. However, when the amount of data to be processed is very large, it is difficult to dramatically increase the performance even with the use of such a dedicated DSP. For example, when the data to be operated consists of 10,000 sets of data, at least 10,000 cycles are required for the operation even when operation for each data set is executable in one machine cycle. In other words, although processing for each data set is executed at a high speed, the processing time extends as the amount of data increases because data processing is executed serially.
When the amount of data to be processed is large, processing performance may be improved by carrying out parallel operation. That is, it is possible to simultaneously execute a plurality of data processes by preparing a plurality of operating units and making them simultaneously operate. In such a situation, for carrying out the same operation for the plurality of sets of data, it is possible to reduce the areas of the operating units while keeping high parallelism by employing a method called SIMD (Single Instruction stream-Multiple Data stream). In other words, when a plurality of data processors are provided, high performance can be achieved with a small area by sharing a controller that interprets instructions and controls the processing.
When the amount of data to be processed is large, performance per area is improved when addition, which is a basic arithmetic operation, is performed bit-serially. For example, thirty-two 1-bit adders (hereinafter, referred to as A) and one 32-bit adder (hereinafter, referred to as B) are considered. These A and B are the same in cycle number of 32 in the sense that 32 additions are performed. However, they are different in length of serial operation executed in one cycle. For example, when B is realized by a series of thirty-two 1-bit adders, the operation time of B is 32-times longer than A although the areas of A and B are the same. On the other hand, when B is realized by a high-speed operating unit such as a carry look-ahead, the operation time of B is shorter than A; however, the area of B is larger than A. Therefore, performance per area of A is better than B.
In the case of a multiplier, when a 2-bit process is executed using a secondary Booth's algorithm, it is possible to reduce the number of additions of partial product to half of the case of 1-bit process.
In this manner, when numerous additions and multiplications are performed, it is possible to improve performance per area by the SIMD method based on a serial operation of 1-bit or 2-bit. Further, this method may be used in various applications because data width of data to be processed is not fixed. As a technique related to the above, inventions disclosed in Japanese Patent Laying-Open Nos. 2003-203225 (hereinafter, referred to as Document 1) and 2001-076125 (hereinafter, referred to as Document 2) can be exemplified.
In a data converting device disclosed in Document 1, a data processor is capable of simultaneously processing a predetermined number of pieces of data in a parallel manner. A buffer memory is capable of storing data which is sequentially inputted, and of simultaneously inputting/outputting a predetermined number of pieces of data in units of bit width suited for the process from/to the data processor in a parallel manner. A control information memory stores buffer memory control information data containing information specifying an LUT (Look Up Table) which is a different function from the data buffer function, and used region information. A buffer controller adaptively determines a data transfer line between the buffer memory and the data processor by assigning a data buffer function region and an LUT function region to the buffer memory based on the control information data, so that each function region is in a form suited for parallel input/output to/from the data processor.
In an image processing device disclosed in Document 2, a transfer controller is provided in an image processing processor. In the transfer controller, additional, updating image processing procedure and data for image processing are transferred by a process controller during an idle cycle time in which image processing is not executed by a processor array part, and temporarily stored in a host buffer. Then from the host buffer, the additional, updating image processing procedure and the data for image processing to a program RAM or a data RAM are transferred.
A processor according to the SIMD method executes an operation on operation data stored in a data memory. Therefore, it is necessary to input data to be used in the operation into the data memory from the outside before the operation, and to output data of an operation result to the outside from the data memory after the operation. Therefore, the processor is not able to carry out operation during the input/output of data to/from the data memory. Accordingly, there is a problem that the total processing time is extended.
Since the data converting device disclosed in Document 1 accumulates data in a buffer that is not used for operation, and transfers the data to a buffer for use in operation in a parallel manner as necessary, it is impossible to reduce the processing time of the processor.
Further, since the image processing device disclosed in Document 2 inputs/outputs data during an idle cycle of processing, it is impossible to reduce the processing time when the number of idle cycles is small.