The present invention relates to an art of compressing and encoding digital video, and more particularly to a block matching arithmetic device for use in motion vector search.
The video compression coding technique is a technique of greatly compressing an enormous amount of information using high temporal correlation of video signals, spacial correlation, and human's visual characteristics of video signals. The video compression coding technique can realize very high compression ratio by combining the inter-frame predictive coding technique, conversion coding technique, and variable-length coding technique. Here, the word "frame" means one screen forming video.
The inter-frame predictive coding technique utilizes the correlation of the temporal direction of video and relates to a method of predicting a current frame based on a reference frame and then transmitting a predictive error signal. There are as an improved inter-frame predictive coding technique an inter-motion-compensation-frame predictive coding scheme in which the motion of a time-varying image is considered, inter-field predictive coding scheme in which inter-frame prediction is replaced with inter-field prediction, and an interpolation predictive coding scheme which performs interpolation by a past reference frame and a future reference frame. There is further an adaptive predictive coding scheme which adaptively selects plural predictive coding schemes described above.
The conversion coding technique is a technique of compressing information contents by linearly converting plural signals. The conversion coding technique is commonly applied to a predictive error signal in a space (horizontal, or vertical) direction in the adaptive predictive coding scheme. In this conversion, the redundancy in a space direction appears noticeably in an image signal. Like the adaptive predictive coding scheme, the conversion coding technique includes an adaptive conversion coding scheme that adaptively selects plural conversion schemes such as a frame conversion coding, a field conversion coding, and a conversion coding only in the horizontal direction.
The variable-length coding technique is a technique of compressing information contents using the tilt of a signal level probability distribution and is generally applied to motion vector in the adaptive predictive coding scheme and conversion coefficient in the adaptive conversion coding scheme. ITU-T H.261 or ISO IS1172(MPEG-1) being the international standard scheme for a time-varying compression coding adopts as an interframe predictive coding scheme an inter-motion-compensation-frame predictive coding scheme using motion vectors. In this scheme, the current frame 151, as shown in FIG. 15, is divided into blocks (current blocks) 152 formed of, for example, (N (lines).times.M (rows)) pixels (generally, 16.times.16). The motion vector search areas 154 for searching motion vectors on the reference frame 153 are set to be (2V (lines).times.2H (rows)) pixels, where when the center point of a current block is set to, for example, the coordinates (0, 0), the coordinate on the horizontal line ranges from -H to H-1 while the coordinate on the vertical line ranges from -V to V-1. Then, the motion vector is worked out by searching the reference block 155 which provides the smallest difference to the current block within the search range.
For example, when the image is still, the difference becomes zero if the reference block which is at the same position as the current block, that is, has the coordinates (0, 0) is a predictive block. If the reference block which is shifted from the candidate block at the same position by h pixels rightward and v pixels downward has the smallest distortion, the block with the coordinates (v, h) is provided as a predictive block. Thus the motion vector MV(v, h) is transmitted. The difference computation will be described here. When the pixel on the upper left of the current block 152 is represented as a(0, 0) and the pixel on the reference frame corresponding to the coordinates a(0, 0) of the current block is represented as b(0, 0), the pixel on the upper left of the reference block 155 where the motion vector becomes MV(h, v) is represented as b(h, v). In this case, the distortion D(h, v) of the current block 152 and the reference block 155 is represented by the following formula (1): EQU D(h,v)=.SIGMA..SIGMA..parallel.b(h+m,v+n)-a(m,n).parallel. (1)
where .parallel. is the norm for computing a distortion. An absolute difference or square error is generally performed. However, an absolute difference operation is frequently used in terms of the complexity and efficiency of computation.
As described above, the method of comparing a current frame and a reference frame in block units is called a block matching method. Where a motion vector is searched in pixel units, (2V.times.2H) reference blocks per current block exits in the motion vector search area 154. When all current blocks in a current frame are searched for a motion vector, an enormous amount of operation is required for the block matching arithmetic operation. Conventionally, the device, for example, disclosed by JP-A-No. 340538/1996 is known as a high-speed, high performance motion vector search device. This device is a device used for only the motion vector search operation and includes plural processor elements to search a motion vector by implementing block matching operations in parallel for plural candidate blocks. However, such a device requires a great number of processor elements, thus resulting in a bulky, expensive device.
Recently, high-speed, high end microprocessors with 64-bit architecture have come to the market with the advance of semiconductor technology. As such a microprocessor can be listed UltraSparc I and II by Sun Microsystems, Alpha 21164 by DEC, R10000 by MIPS, PA-80000 by Hewlett-Packard, Power PC620 by Motrola, and so on. In some cases, a 64-bit arithmetic unit such as an ALU built in a microprocessor can be divided into eight 8-bit units, four 16-bit units, and two 32-bit units to process plural data in parallel by a single instruction. The SIMD (Single Instruction Stream, Multiple Data Stream) instruction improves the data operation capability according to data size by eight times, four times, or twice. In some microprocessors, an instruction for executing a data arithmetic operation peculiar to the process algorithm is added to speed a multimedia process such as graphic process or image coding process.
Some UltraSparc processors by Sun Microsystems use an instruction set specially designed for a multimedia process called VIS (Visual Instruction Set) (IEEE Micro, Vol. 16, No. 4, pp 10-20, August, 1996). The instruction set contains an pdist instruction used for motion vector search according to the block matching method. As shown in FIG. 16, the 64-bit register r1 stores eight 8-bit data a0 to a7 while the 64-bit register r2 stores eight 8-bit data bo to b7. The pdist instruction accumulates the sum of absolute values of references between eight 8-bit data of the register r1 and 8 pieces of 8-bit data of the register r2 at the corresponding positions with data of the register r3. Since the image data is formed of 8 bits, an block matching arithmetic operation for eight pixels can be implemented by the pdist instruction of one cycle. Where the block size of the block matching operation is 16.times.16, the error of a single motion vector candidate can be computed by executing the pidist instruction 32 times.
However, when the microprocessor reads or writes word data output from/to the data memory, data can be read or written by only the addresses aligned with the byte number of one word. For example, when one word is formed of 64 bits, word data can be read or written based on only the lower three bits formed of 000 among addresses grouped every 8 bits, that is, 0000h, 0008h, 0010h, 0018h, . . .
FIG. 17 illustrates a conventional block matching arithmetic device that executes a block matching arithmetic operation using a microprocessor. The block matching arithmetic device consists of a microprocessor 171 and a memory unit 172.
The microprocessor 171 is formed of a register file 173 and an arithmetic unit 174 that executes a data arithmetic operation. Here, it is assumed that the block size is 16.times.16.
When the upper right position of the reference block has b(0, 0), sixteen data b(0, 0) to b(0, 15) for one line of block correspond to two pieces of word data, that is, b(0, 0) to b(0, 7) and b(0, 8) to b(0, 15). Those data can be immediately used as source data of the pdist instruction that is executed in the arithmetic unit 174.
However, when the upper right position of the reference block has b(0, 1), sixteen data b(0, 1) to b(0, 16) for one line of block correspond to three word data, that is, b(0, 1) to b(0, 7), b(0, 8) to b(0, 15), and b(0, 16) to b(0, 23). Hence, if two word data b(0, 1) to b(0, 8) as well as b(0, 9) to b(0, 16) are not created by reading three word data out of the memory device 172 and then shifting the positions of them by the arithmetic unit 174, the sixteen data cannot be used as source data of the pidist instruction. The disadvantage is that the block matching arithmetic operation cannot be effectively executed because the data alignment procedure is needed.
For the purpose of data alignment, VIS instruction of UltraSparc prepares instruction alignaddr for address alignment and instruction faligndata for data alignment. Aligned word data is created by aligning data address decided by the position of a reference block and then loading two successive word data. The pdist instruction is executed by applying the procedure to the reference block and the current block.
FIG. 18 shows an example of a program written in C-language format to execute a block matching arithmetic operation of a block with a 16.times.16 size. In order to create two input data for pdist instruction, 8 instructions in totally including two alignaddr instructions, two faligndata instructions, and two load instructions (represented by substitution in a program) are executed. As a result, there is the disadvantage in that nine instructions in total are required to execute pdist instruction, so that the efficiency of the block matching arithmetic operation cannot be improved.