1. Field of the Invention
The present invention relates to a processing device, and particularly relates to a processing device which processes a large quantity of data.
2. Related Background Art
When a stream of data such as audio data or video data is reproduced, media streaming is performed, and this media streaming has a characteristic of performing a limited small number of processings on a large quantity of data rather than repeatedly performing a plurality of processings on one piece of data.
When such data streaming is performed by a processor, (1) data loading, (2) operation, and (3) pointer increment are repeated. In a dedicated processor such as a DSP, a dedicated instruction set for this processing is provided, and this processing can be performed by one instruction. However, if this processing is performed by a processing device such as a general-purpose RISC processor, three or more instructions are needed. For example, a program to find a total sum of stream data by the general-purpose RISC processor is shown as follows:
int totalsum(int *streamData, int dataNum){int i;_R1=0_R2=streamdata;for (i=0; i<dataNum; i++){_R3=*(_R2);_R1=_R1+_R3_R2=_R2+1}return(_R1);}
The program exemplified here is a program of a function totalsum( ) to find the total sum of an array stream Data. dataNum represents the number of data of streamData. Further, R1, R2, and R3 represent registers, respectively. More specifically, R1 is a register in which the total sum is stored, R2 is a register in which a pointer indicating the position of the array streamData, and R3 is a register in which data loaded from streamData is stored.
In this program, data in the register R1 in which the total sum is to be stored is reset to zero. Then, the value of stream Data (namely, pointer) is stored in the register R2.
Next, as is known from instructions in a loop by a for sentence, (1) streamData in a position designated by _R2 in _R3=*(_R2) is loaded into the register R3. Then, (2) the loaded data in the register R3 is added to current data in the register R1 by _R1=_R1+_R3. Subsequently, (3) the value of the register R2 in which the pointer is stored is incremented by one.
Then, the above processing from (1) to (3) is repeatedly performed as long as the condition of the for sentence is satisfied, that is, the condition of i<dataNum is satisfied. More specifically, the above processing from (1) to (3) is repeated the number of times equal to the number of dataNum.
As can be seen from this example, if the data streaming is performed by the general-purpose RISC processor, three instructions of (1) data loading, (2) operation, and (3) pointer increment are repeated.
To reduce the number of such repeated instructions, it is conceivable to provide a dedicated instruction set such as in the DSP also in the general-purpose RISC processor, but if the complicated instruction set is implemented, there arises a problem that the circuit scale of the RISC processor increases.
There is a possibility that such a problem arises not only in data streaming but also in every data processing of repeatedly performing the same processing on large data.
Further, to solve a similar problem, in U.S. Pat. No. 5,155,816, a floating-point load instruction PFload of a line different from that of a normal load instruction is provided, and to reduce a mismatch between the supply of data by the PFload instruction and data processing, data obtained by the PFload instruction are stored in a FIFO buffer and sequentially outputted. Further, in U.S. Pat. No. 6,282,631, the FIFO buffer is mapped in a memory space, and this FIFO buffer is used to sequentially read a bit stream to be decoded. In both of these documents, the FIFO buffer is used to hide the latency of memory access, but the latency improving effect is not necessarily sufficient or an increase in circuit scale is inevitable.