The present invention relates to arithmetic units which perform multimedia signal processing at higher speed, and to an image processing apparatus using the arithmetic unit.
Prior art program-controlled processors (arithmetic units) mount vector instructions, thereby obtaining higher performance. A prior art arithmetic unit shown in FIG. 14 comprises a program control circuit 1401 which decodes a vector instruction and outputs a first start signal and a second start signal, a first address generator 1402 which outputs a first address in accordance with the first start signal, a first data memory 1403 which outputs first data on the basis of the first address, a pipeline operation circuit 1404 which executes a pipeline operation on the basis of the first data, a second address generator 1405 which outputs a second address in accordance with the second start signal, and a second data memory 1406 which contains a result of the operation by the pipeline operation circuit 1404 on the basis of the second address.
As shown in FIG. 14, in this arithmetic unit, when the vector instruction is decoded by the program control circuit 1401, the first start signal is output by the program control circuit 1401, and the generation of N addresses is started by the first address generator 1402 in accordance with the first start signal. The first data memory 1403 which receives the N addresses supplies N pieces of data to the pipeline operation circuit 1404. The pipeline operation circuit 1404 receives the supplied N pieces of data and executes the pipeline operation processing.
In addition, the program control circuit 1401 outputs the second start signal in a timing when initially processed data are output from the pipeline operation circuit 1404, and the second address generator 1405 outputs N addresses to the second data memory 1406 in accordance with the second start signal. Accordingly, operation results which are output by the pipeline operation circuit 1404 are successively stored in the second data memory 1406.
Then, when the output of the N pieces of data is finished, the first address generator 1402 and the second address generator 1405 output a first end signal and a second end signal to the program control circuit 1401, respectively, thereby terminating the vector instruction.
In the case of applications requiring a very high operation performance such as real time image processing, general pipeline operation circuits sometimes do not have sufficiently high operation performances. In this case, the operation performance is increased by a hybrid structure in which specific high load operations are performed by dedicated pipeline operation circuits (such as a DCT (Discrete Cosine Transform) operation circuit) and other processings are performed by the general arithmetic circuits, thereby ensuring the real time processing. However, the required dedicated pipeline operation circuits vary with the contents to be processed. Therefore, the program control circuit has timing designs which are inherent in respective dedicated pipeline operation circuits. In other words, the timing designs are specific to respective applications. Considering the age of IP (Intellectual Property) which will come in the future, it is a large problem that the program control circuit which is the most complex part in the processor should be changed according to purposes.
The present invention is made in view of this problem, and it provides an arithmetic unit having a structure which is divided into a general arithmetic circuit and a dedicated arithmetic circuit to prevent the change in the dedicated arithmetic circuit for each purpose from affecting the general arithmetic circuit, whereby the unit can be applied to various applications, and image processing apparatus using the arithmetic unit.
An arithmetic unit according to one embodiment the present invention has a general arithmetic circuit and a dedicated arithmetic circuit, the general arithmetic circuit mounts plural vector instructions and executes a pipeline operation on tile basis of the vector instructions together with the dedicated arithmetic circuit. In the arithmetic unit, the general arithmetic circuit outputs: a dedicated pipeline operation circuit selection signal notifying a contest of arithmetic in the dedicated arithmetic circuit; plural operation results of the general arithmetic circuit; and a general arithmetic circuit output data enable signal notifying an output timing of the plural operation results, to the dedicated arithmetic circuit. The general arithmetic circuit receives: plural dedicated operation results of the dedicated arithmetic circuit; and a dedicated arithmetic circuit output data enable signal for recognizing an output timing of the plural dedicated operation results and a termination timing of the output data, from the dedicated arithmetic circuit. The dedicated arithmetic circuit comprises: plural dedicated pipeline operation circuits each outputting a signal notifying a number of pipeline stages and executing a pipeline operation for the plural operation results of the general arithmetic circuit; a data selection circuit for arbitrarily selecting dedicated operation results which are output by one of the plural dedicated pipeline operation circuits, from dedicated operation results which are respectively output by the plural dedicated pipeline operation circuits, in accordance with the dedicated pipeline operation circuit selection signal of the general arithmetic circuit, and outputting the arbitrarily selected dedicated operation results as the plural dedicated operation results to the general arithmetic circuit; and a control circuit for receiving the signals each notifying the number of pipeline stages, each of which signals is output by each of the plural dedicated pipeline operation circuits, and the dedicated pipeline operation circuit selection signal and the general arithmetic circuit output data enable signal of the general arithmetic circuit, and outputting the dedicated arithmetic circuit output data enable signal to the general arithmetic circuit.
According to the above-described structure, the arithmetic unit can mount an arbitrary dedicated pipeline operation circuit which is suitable for each purpose without changing the program control circuit, regardless of the structure of the general arithmetic circuit. Consequently, the arithmetic unit which can be applied to the various applications can be realized.
An arithmetic unit according to another embodiment of the present invention has a general arithmetic circuit and a dedicated arithmetic circuit, the general arithmetic circuit mounts plural vector instructions and executes a pipeline operation on the basis of the vector instructions together with the dedicated arithmetic circuit. The general arithmetic circuit comprises: a program control circuit for outputting a first start signal, a second start signal, a first operation circuit selection signal, a second operation circuit selection signal, a dedicated pipeline operation circuit selection signal and a general arithmetic circuit output data enable signal, and receiving a dedicated arithmetic circuit output data enable signal; a first address generator for continuously outputting M first addresses on the basis of the first start signal from the program control circuit; a first data memory for outputting M pieces of first data on the basis of the first addresses from the first address generator; a first pipeline operation circuit for executing a pipeline operation for the first data from the first data memory and successively outputting M first operation results, in accordance with the first operation circuit selection signal from the program control circuit; a second pipeline operation circuit for executing a pipeline operation for second operation results from the dedicated arithmetic circuit and successively outputting M third operation results, in accordance with the second operation circuit selection signal from the program control circuit; a second address generator for continuously outputting M second addresses on the basis of the second start signal from the program control circuit; and a second data memory containing the M third operation results from the second pipeline operation circuit on the basis of the second addresses from the second address generator. The dedicated arithmetic circuit comprises: N dedicated pipeline operation circuits each outputting a signal notifying a number of pipeline stages, and executing a pipeline operation for the first operation results from the first pipeline operation circuit in the general arithmetic circuit; a data selection circuit for selecting n-th dedicated operation results from dedicated operation results which are respectively output by the plural dedicated pipeline operation circuits, in accordance with the dedicated pipeline operation circuit selection signal from the program control circuit in the general arithmetic circuit, and outputting the n-th dedicated operation results to the second pipeline operation circuit in the general arithmetic circuit as the second operation results; and a control circuit for receiving the signals each notifying the number of pipeline stages, each of which signals is output by each of the plural dedicated pipeline operation circuits, and the dedicated pipeline operation circuit selection signal and the general arithmetic circuit output data enable signal from the program control circuit in the general arithmetic circuit, and outputting the dedicated arithmetic circuit output data enable signal to the program control circuit in the general arithmetic circuit.
According to the above-described structure, the arithmetic unit is divided into the general arithmetic circuit and the dedicated arithmetic circuit. The dedicated arithmetic circuit output data enable as information inherent in the dedicated arithmetic circuit, required for the timing control by the program control circuit in the general arithmetic circuit is notified from the dedicated arithmetic circuit to the general arithmetic circuit so as to prevent the change in the dedicated arithmetic circuit for each purpose from affecting the pipeline operation in the general arithmetic circuit. The program control circuit in the general arithmetic circuit controls the output timing of the pipeline operation circuit on the basis of the dedicated arithmetic circuit output data enable signal as the notified information. That is, the program control circuit in the general arithmetic circuit decodes the vector instruction, then asserts the first start signal, and after the assertion of the first start signal, detects the output timing of first one of the first operation results from the first pipeline operation circuit on the basis of the number of pipeline stages of the first pipeline operation circuit. Simultaneously, the program control circuit asserts the general arithmetic circuit output data enable, and negates the first start signal after the assertion of the first start signal and after M cycles. After the negation of the first start signal, the program control circuit detects the output timing of M-th one of the first operation results from the first pipeline operation circuit on the basis of the number of pipeline stages of the first pipeline operation circuit, and simultaneously negates the general arithmetic circuit output data enable signal. The control circuit in the dedicated arithmetic circuit detects the output timing of first one of n-th dedicated operation results from an n-th dedicated pipeline operation circuit on the basis of an n-th signal notifying the number of pipeline stages, which signal is selected in accordance with a dedicated pipeline operation circuit selection signal, after the general arithmetic circuit output data enable signal is asserted. Simultaneously, the controls circuit asserts the dedicated arithmetic circuit output data enable signal, detects the output timing of the M-th one of the n-th dedicated operation results from the n-th dedicated pipeline operation circuit on the basis of the n-th signal notifying the number of pipeline stages which is selected in accordance with the dedicated pipeline operation circuit selection signal after the general arithmetic circuit output data enable signal is negated, and simultaneously negates the dedicated arithmetic circuit output data enable signal. Then, after the dedicated arithmetic circuit output data enable signal is asserted, the program control circuit detects the output timing of first one of the third operation results from the second pipeline operation circuit on the basis of the number of pipeline stages of the second pipeline operation circuit. Simultaneously, the program control circuit asserts the second start signal, then after the dedicated arithmetic circuit output data enable signal is negated, detects the output timing of M-th one of the third operation results from the second pipeline operation circuit on the basis of the number of pipeline stages of the second pipeline operation circuit, and simultaneously negates the second start signal. Therefore, the arithmetic unit of the present invention can mount an arbitrary dedicated pipeline operation circuit which is suitable for each purpose, without changing the program control circuit. Consequently, the arithmetic unit which can be applied to the various applications can be realized.
According to another embodiment of the present invention, the first pipeline operation circuit in the general arithmetic circuit comprises: a fist register for receiving the first data from the first data memory and outputting second data, on the basis of the first operation circuit selection signal from the program control circuit; a second register for outputting third data which are previously stored, a multiplier for receiving the second data from the first register and the third data from the second register, and outputting a result obtained by multiplying the second and third data as fourth data; a third register for receiving the fourth data from the multiplier, and outputting fifth data; a fourth register for outputting sixth data which are previously stored; an arithmetic operation unit for receiving the fifth data from the third register and the sixth data from the fourth register, and outputting a result of arithmetic of the fifth and sixth data as seventh data; and a fifth register for receiving the seventh data from the arithmetic operation unit, and outputting the first operation results as outputs of the first pipeline operation circuit. The second pipeline operation circuit in the general arithmetic circuit comprises: a sixth register for receiving the second operation results from the dedicated arithmetic circuit and outputting the third operation results as output of the second pipeline operation circuit, on the basis of the second operation circuit selection signal from the program control circuit. A specific one of the dedicated pipeline operation circuits in the dedicated arithmetic circuit comprises: an IDCT (Inversion Discrete Cosine Transform) operation unit for receiving the first operation results from the first pipeline operation circuit, subjecting the results to one-dimensional inversion discrete cosine transform, and outputting the dedicated operation results as output of the dedicated pipeline operation circuit.
According to the above-described structure, the first pipeline operation circuit in the general arithmetic circuit performs the inverse quantization operation and the dedicated pipeline operation circuit in the dedicated arithmetic circuit performs the inversion DCT operation. Therefore, the inverse quantization and the inversion DCT operation can be continuously performed by the pipeline operation.
According to another embodiment of the present invention, the first pipeline operation circuit in the general arithmetic circuit comprises: a first register for receiving the first data from the first data memory and outputting the first operation results as outputs of the first pipeline operation circuit, on the basis of the first operation circuit selection signal from the program control circuit. The second pipeline operation circuit in the general arithmetic circuit comprises: a second register for receiving the second operation results from the dedicated arithmetic circuit and outputting second data, on the basis of the second operation circuit selection signal from the program control circuit; a third register for outputting third data which are previously stored; an arithmetic operation unit for receiving the second data from the second register and the third data from the third register, and outputting a result of arithmetic of the second and third data as fourth data; a fourth register for receiving the fourth data from the arithmetic operation unit and outputting fifth data; a fifth register for outputting sixth data which are previously stored; a multiplier for receiving the fifth data from the fourth register and the sixth data from the fifth register, and outputting a result which is obtained by multiplying the fifth and sixth data as seventh data; and a sixth register for receiving the seventh data from the multiplier, and outputting the third operation results as outputs of the second pipeline operation circuit. A specific one of the dedicated pipeline operation circuits in the dedicated arithmetic circuit comprises: a DCT (Discrete Cosine Transform) operation unit for receiving the first operation results from the first pipeline operation circuit in the general arithmetic circuit, subjecting the results to one-dimensional discrete cosine transform, and outputting the second dedicated operation results as outputs of the dedicated pipeline operation circuit.
According to the above-described structure, the second pipeline operation circuit in the general arithmetic circuit performs the quantization operation and the dedicated pipeline operation circuit in the dedicated arithmetic circuit performs the DCT operation. Therefore, the DCT operation and the quantization operation can be continuously performed by the pipeline operation.
According to another embodiment of the present invention, the arithmetic operation unit comprises: an adder for receiving a first input and a second input, and outputting a result which is obtained by adding the first and second inputs; a subtracter for receiving the first input and the second input, and outputting a result which is obtained by subtracting the second input from the first input, and an output selector for receiving the addition result of the adder, the subtraction result of the subtracter and xe2x80x9c0xe2x80x9d, and outputting data which are selected from the addition result, the subtraction result and xe2x80x9c0xe2x80x9d, the output selector selecting and outputting the addition result of the adder when the first input is a positive number, selecting and outputting xe2x80x9c0xe2x80x9d when the first input is xe2x80x9c0xe2x80x9d, and selecting and outputting the subtraction result of the subtracter in other cases.
According to the above-described structure, the first pipeline operation circuit in the general arithmetic circuit performs the inverse quantization operation and the dedicated pipeline operation circuit in the dedicated arithmetic circuit performs the inversion DCT operation. Therefore, the inverse quantization and the inversion DCT operation can be cotinuously performed by the pipeline operation.
An image processing apparatus according to another embodiment of the present invention mounts a plurality of the arithmetic units, and the image processing apparatus comprises: a first arithmetic unit having a DCT operation circuit for receiving the first operation results, subjecting the first operation results to one-dimensional discrete cosine transform, and outputting first dedicated operation results, as a first dedicated pipeline operation circuit, and an IDCT operation circuit for receiving the first operation results, subjecting the first operation results to one-dimensional inversion discrete cosine transform, and outputting second dedicated operation results, as a second dedicated pipeline operation circuit, the second arithmetic unit having a half-pel operation circuit for receiving the first operation results, subjecting the first operation results to a half-pel operation, and outputting first dedicated operation results, as a first dedicated pipeline operation circuit, and a post-noise reduction filter operation circuit for receiving the first operation results, subjecting the first operation results to a post-noise reduction filter, and outputting second dedicated operation results, as a second dedicated pipeline operation circuit; a host interface for sending/receiving data to/from a host microcomputer; a video interface for receiving image data from an image A/D converter, subjecting the image data to pre-scaling and outputting CIF (Common Internet File) data or QCIF (Quadrature Common Internet File) data, or receiving CIF data or QCIF data, subjecting the CIF data or QCIF data to post-scaling and outputting the data to an image D/A converter; a DMA (Direct Memory Access) control circuit for controlling input/output of data from the host microcomputer via the host interface, input/output of data from a first data memory or a second data memory in the first arithmetic unit, input/output of data from a first data memory or a second data memory in the second arithmetic unit, and input/output of the CIF data or QCIF data from the video interface, to/from a bulk memory; and a common memory having a function of transferring data between the first arithmetic unit and the second arithmetic unit.
According to the above-described structure, the second pipeline operation circuit in the general arithmetic circuit performs the quantization and the dedicated pipeline operation circuit in the dedicated arithmetic circuit performs the DCT operation. Therefore, the DCT operation and the quantization operation can be continuously performed by the pipeline operation. In addition, the image processing apparatus mounts a plurality of the arithmetic units including the general arithmetic circuit and the dedicated arithmetic circuit. The first dedicated arithmetic circuit comprises the DCT operation circuit and the IDCT operation circuit, and the second dedicated arithmetic circuit comprises the post-noise reduction filter operation circuit and the half-pel operation circuit. Therefore, the image processing apparatus of the present invention functions as an encoder apparatus when only encoder operations are performed, functions as a decoder apparatus when only decoder operations are performed, and functions as a code apparatus when the encoder operations and decoder operations are performed in time-shared manners.