1. Field of the Invention
The present invention relates to an operation circuit such as an exclusive-OR circuit and an addition (subtraction) circuit, and more particularly to an operation circuit suitable for a motion vector detection device and the like.
2. Description of the Background Art
Among devices which use many operation circuits is a motion vector detection device. The motion vector detection device is an essential one in a compression device for dynamic image data using motion compensation, and among the typical algorithms using the motion compensation is MPEG (Moving Picture Experts Group) and the like. The motion vector detection device searches for a picture like the current picture (template data TB) among stored pictured (search window data SW) and outputs its position as a motion vector MV, as shown in FIG. 15. In developing the compression device for dynamic image data, the performance of the motion vector detection device has a great effect on that of the compression device for dynamic image data. As to the performance of the motion vector detection device, one of the key factors for high-quality image is detection of motion vector MV in searching wider range (wider search window).
FIG. 16 is a block diagram showing a constitution of a motion vector detection device using a block matching technique for detecting an optimum motion vector. A cache memory 40 stores the search window data SW and a cache memory 50 stores the template data TB. A systolic array 10 calculates an absolute difference of a specified area of the search window data SW and the template data TB in a unit of predetermined area of pixels (8 pixels.times.8 pixels in FIG. 15). An addition circuit 20 generates the sum of the absolute differences. A minimum-value detection circuit 30 stores the sum of the absolute differences. A vector generation unit 70 generates a motion vector of the specified area of the search window data SW and the template data TB and store it. Through repeating this operation with changes of the specified area of the search window data SW, a plurality of absolute differences are stored in the minimum-value detection circuit 30 and a plurality of motion vectors are stored in the vector generation unit 70. Then, the minimum-value detection circuit 30 selects one of the motion vectors stored in the vector generation unit 70 corresponding to the minimum one of the sums of the absolute differences stored in the minimum-value detection circuit 30 to output it as an optimum motion vector MV.
FIG. 17 is a block diagram showing a part which obtains the sum of the absolute differences and detects the minimum value. Reference numbers in FIG. 17 correspond to those of FIG. 16. As shown in FIG. 17, the systolic array 10 consists of sixty-four (8.times.8 pixels) processor elements PE for calculating the absolute difference and eight side registers SR. The processor elements PE are connected to one another in a linear systolic array. Each of the processor elements PE corresponds to each pixel in the template data TB. The template data TB are loaded into the systolic array 10 and each of the pixels is stored in each of registers in the processor elements PE. On the other hand, the search window data SW are stored in the eight side registers SR and eight pixels are shifted (loaded) from the eight side registers SR into eight of the processor elements PE to be stored therein. Each of the processor elements PE performs an arithmetic operation to obtain the absolute difference for one pixel of the search window data SW and the template data TB. This part repeats the above loading, storing and arithmetic operation while changing the specified area of the search window data SW.
Thus, the processor elements PE have both a data loading (shifting) function and a data storing function.
FIGS. 18 to 20 illustrate processor elements PE. Each processor element PE internally consists of a data shift unit TG for loading the template data TB and the search window data SW in synchronization with a clock CK, a register R for temporarily storing data and at least one of an exclusive-OR circuit EOR and an addition circuit ADD. Though the processor element PE may have both the exclusive-OR circuit EOR and the addition circuit ADD as shown in FIG. 20, the following discussion will be presented, taking a processor element PE having either the exclusive-OR circuit EOR or the addition circuit ADD as shown in FIG. 18 or 19 as an example since each of the exclusive-OR circuit EOR and the addition circuit ADD is practically used as a minimum function unit in many cases.
The exclusive-OR circuit EOR and the addition circuit ADD in the background art have circuit configurations of FIGS. 21 and 22, respectively. In the addition circuit, a difference-operation function can be implemented by supplying an inverted input to its one input. In FIGS. 21 and 22, the data shift unit TG is omitted, and so is in circuits discussed below.
The background-art exclusive-OR circuit EOR of FIGS. 21 and 22 is provided independently of the register R.
The operation circuit such as the exclusive-OR circuit and the addition circuit ADD in the background art have a problem of large circuit scale (H/W) in terms of hardware and function. When the background-art exclusive-OR circuit or addition circuit ADD is applied to the processor element PE, the systolic array 10 becomes huge because the number of processor elements PE corresponding to the number of pixels in the template data TB and the direction of quantization (the number of required bits per pixel). For example, if the template data TB has 8.times.8 pixels with 8 bits per pixel, 512 processor elements PE are needed.