1. Field of the Invention
The present invention relates to a motion compensation adder which is used to decode compressed moving pictures.
2. Description of the Related Art
Multimedia applications to information equipment which is represented by a personal computer are increasingly propagating, and such information equipment is being newly provided with a new function of handling voices (speech), audio, still pictures and moving pictures as well as an old function of handling information only characters which has been hitherto provided. Each of such so-called multimedia data as voices, audio, still pictures, moving pictures, etc. have an extremely large data amount, so that the data are generally processed so as to be just removed and compressed to one several-tenths of the original data amount by using a compression technique in conformity with the characteristic of each data, then stored in an external storage device or transmitted through a communication, and then decoded on multimedia information equipment.
For example, for compression and decompression of moving pictures, it is general to perform the data processing in conformity with so-called MPEG (Moving Pictures Experts Group)--1 video standards (ISO/IEC JTC1 CD 11172, Information Technology-Coding of Moving Pictures and Associated Audio for Digital Storage Media up to 1.5 Mbit/s; Part 2: Coding of Moving Picture Information). In order to decode and display in real time the data which are compressed according to the MPEG-1 video standards, it is needed to perform operation processing at a rate of several million instructions per second. Therefore, a custom LSI which is designed for MPEG-1 video decompression or a special signal processor for video processing has been hitherto used. However, the debut of a new microprocessor architecture which is represented by RISC (Reduced Instruction Set Computer) has drastically enhanced the performance of general-purpose microprocessors, and the development of microstructure design and the increase of processing speed in an LSI process technique have made it possible to easily integrate signal processing hardware such as a multiply-accumulate (sum-of-products) multiply-accumulator, etc., and these developments of the LSI process technique has promoted such a tendency that the MPEG-1 video decompression is performed by software on a general-purpose microprocessor which has been hitherto installed in a device, whereby an exclusively-used LSI or a video signal processor can be omitted, resulting in reduction of the price of the multimedia equipment.
At present, there have been publicly known some processors which aim to perform the video signal processing by software on a general-purpose microprocessor. In this specification, a 32-bit microprocessor V830 of NEC which is disclosed in "Nikkei Electronics", No. 635 (May 8, 1995), pp 111-121, and in "IEEE MICRO Magazine", Vol. 15, No. 6 (December 1995), pp 20-29, is targeted, and a system for increasing the speed of motion compensation processing which needs the largest processing amount in the MPEG-1 video decompression will be described by using the 32-bit microprocessor V830.
The architecture and the principle of the motion compensation processing of the 32-bit microprocessor V830 and the conventional motion compensation processing will be described.
First, the architecture and the instruction set of the 32-bit microprocessor V830 of NEC will be briefly described as an example of a microprocessor having enhanced signal processing functions.
FIG. 7 is a block diagram showing a system for performing motion compensation by using the V830 microprocessor. This system includes a microprocessor 11 for performing operation processing, and a main memory 10 for storing a program 20 and data 21. The microprocessor 11 includes a register file 12 having thirty-two(32) bit length in which data 21 on the main memory 10 are stored, and an execution unit 13 for performing operation on the data on the register file 12. The execution unit 13 includes an arithmetic logic unit 28 for performing arithmetic operations such as addition and subtraction, and logical operations such as logical sum (OR), logical product (AND), exclusive OR, etc., a bit shifter 29 for performing a bit shift operation, and a multiply-accumulator 30 for performing multiply and multiply-accumulate operations or instructions.
FIG. 8 shows a part of the instruction set of the microprocessor V830 while the part is sectioned into a load/store instruction, an arithmetic and logical instruction and a shift instruction. The instructions shown in FIG. 8 will be described.
With ld.b (Load Byte) instruction, the value obtained by sign-extended 16-bit immediate value imm16 to 32-bit is added with 32-bit length data of register reg1 to generate a 32-bit length address, and data of 1 byte (8 bits) are read out from the position on the main memory which is indicated by the address thus generated, sign-decoded to 32-bit length and then stored in register reg2.
With ld.h (Load Halfword) instruction, the value obtained by sign-decoding 16-bit immediate value imm16 to 32-bit is added with 32-bit data of the register reg1 to generate a 32-bit length address, and data of 1 half word (16 bits) are read out from the position on the main memory which is indicated by the address thus generated, sign-decoded to 32-bit length and then stored in the register reg2.
With ld.w (Load Word) instruction, the value obtained by sign-decoding 16-bit immediate value imm16 to 32-bit is added with 32-bit data of the register reg1 to generate a 32-bit length address, and data of 1 word (32 bits) are read out from the position on the main memory which is indicated by the address generated and then stored in the register reg2.
With st.b (Store Byte) instruction, the value obtained by sign-decoding 16-bit immediate value imm16 to 32-bit is added with 32-bit length data of the register reg1 to generate a 32-bit address, and data of the least significant byte (8 bits) of the register reg2 are stored at the position on the main memory which is indicated by the address thus generated.
With st.w (Store Word) instruction, the value obtained by sign-decoding 16-bit immediate value imm16 to 32-bit is added with 32-bit data of the register reg1 to generate a 32-bit address, and data of 1 word (32 bits) which are held by the register reg2 are stored at the position on the main memory which is indicated by the address thus generated.
With add (Addition) instruction, the word (32-bit) data which are held by the register reg2 are added with the word data which are held by the register reg1, and then the addition result is stored in the register reg2.
With addi (Add Immediate) instruction, the value obtained by sign-decoding 16-bit immediate value imm16 to 32-bit is stored in the register reg2.
With andi (AND Immediate) instruction, the value obtained by sign-decoding 16-bit immediate value to 32-bit and the word length data held in the register reg2 are subjected to logical product every bit, and the result is stored in the register reg2.
With mac (Multiply and Accumulate) instruction, multiplication result of word data in the register reg1 and word data in the register reg2 is added with word length data held in the register reg1, and then the addition result is subjected to clipping processing of 32-bit length and stored in the register reg2. The clipping processing replaces the addition result with 0.times.7fffffff if the addition result is larger than 0.times.7fffffff and with 0.times.80000000 if the addition result is smaller than 0.times.80000000, thereby reducing an error when the addition result cannot be expressed by signed 32-bit format and thus it overflows. Here, 0.times. represents hexadecimal expression.
With max (Maximum) instruction, word (32-bit) data held by the register reg2 and word length data held by the register reg1 are compared as a signed integer, and a larger value is stored in the register reg3.
With min (Minimum) instruction, word (32-bit) data held by the register reg2 and word data held by the register reg1 are compared as a signed integer, and a smaller value is stored in the register reg3.
With mov (move) instruction, word (32-bit) length data held by the register reg1 or the value obtained by sign-decoding immediate value imm to word (32-bit) are stored in the register reg2.
With xor (Exclusive Or) instruction, word (32-bit) data held by the register reg2 and word data held by the register reg1 are subjected to exclusive OR every bit, and the result is stored in the register reg2.
With shl (Shift Left) instruction, the lower 32 bits of the result obtained by subjecting word (32-bit) data held by the register reg1 to logical left shift by the bit number which is indicated by immediate value imm5, is stored in the register reg1.
With shr (Shift Right) instruction, the result obtained by subjecting word (32-bit) data held by the register reg1 to logical right shift by the bit number which is indicated by immediate value imm5, is stored into the register reg1.
With shrd3 (Shift Right Doubleword) instruction, double word (64-bit) length data which contain word (32-bit) length data held by the register reg3 as an upper word and word length data held by the register reg2 as a lower word, are subjected to right shift by the bit number indicated by lower 5 bits of the register reg3, and then lower 32 bits of the result are stored in the register reg2.
The microprocessor V830 adopts the load store architecture, and an operation target (operand) is limited to data which are put on the register file. Accordingly, in order to operate data on the main memory, there is needed a procedure of transferring the data from the main memory onto the register file in accordance with a load instruction before the operation, operating the data and then transferring the operation result on the register file onto the main memory in accordance with a store instruction. The program 20 put on the main memory 10 is described by using a instruction set shown in FIG. 8 to control the operation of the microprocessor 11.
Next, the motion compensation will be described with reference to FIGS. 7 and 9. In the motion compensation processing, a pixel value of a predicted picture which is expressed by an unsigned value indicated by a motion vector is added with an error value which is subjected to inverse DCT (Discrete Cosine Transform) and expressed by a signed value to generate a pixel of a new picture.
In an actual system, as shown in FIG. 7, a pixel value 22 of a predicted picture which corresponds to an input of the motion compensation processing and an error value 23 are stored on the main memory 10, and the respective places thereof are indicated by pointers put on the register file 12 of the microprocessor 11. Further, a pixel value of a generated picture which corresponds to an output of the motion compensation processing is stored at a place on the main memory 10 which is indicated by another pointer put on the register file 12 of the microprocessor 11.
The details of the motion compensation processing of one pixel will be described with reference to FIG. 9. Before the motion compensation processing is started, a pointer PP to the pixel value 22 of the predicted picture, a pointer pe to the error value 23 and a pointer pc to the pixel value 24 of the generated picture are assumed to be stored on the register file 12.
First, a pixel value p of a predicted picture which is expressed by 8-bit unsigned value by referring to the pointer pp to the pixel value 22 of the predicted picture and an error value e which is expressed by 16-bit signed value by referring the pointer pe to the error value 23 are obtained from the main memory, and then stored in the register file 12 (201).
Secondly, the pixel value p of the predicted picture is converted to signed value and added with the error value e, and then stored in a temporary variable t which is ensured on the register file 12 (203).
Thirdly, clipping processing is performed so that the temporary variable t is set to a value in the range from 0 to 255 which can be expressed by 8-bit unsigned value (200). Specifically, the temporary variable t is compared with 255 (203), and if the temporary variable t is larger than 255, 255 is set to the temporary variable t (204). Further, the temporary variable t is compared with 0 (205), and if the temporary variable t is smaller than 0, 0 is set to t (206).
Fourthly, the temporary variable t is stored at a place on the main memory 10 which is indicated by the pointer pc to the pixel value of the generated picture (207).
Finally, the conventional motion compensation processing method will be described with reference to FIG. 10.
In the conventional motion compensation processing, the error value and the pixel value of the predicted picture which are stored in the main memory 10 are taken out, and then stored into the register to be added with each other. The addition result is subjected to the clipping processing by using two different instructions.
In the case of FIG. 10, at a initialize step (210), with a instruction move 255, r10, the upper limit of the pixel value of the generated picture is put on r10 (212), and the error value 23 (corresponding to the error value e in FIG. 9) is stored in a register r12 in step 213 while the pixel value 22 of the predicted picture (corresponding to the pixel value p of the predicted picture in FIG. 9) is stored in a register r13 in step 214, and then added (215, corresponding to 202 in FIG. 9) and subjected to the clipping processing 211 to thereby obtain the pixel value 24 (corresponding to the pixel value c of the generated picture in FIG. 9) of the generated picture on r13.
When the pixel value is stored in the register r13 (214), the load byte (ld,b) instruction of the microprocessor V830 regards a value to be loaded as an 8-bit signed value although the pixel value is a 8-bit unsigned value, and thus it performs 24-bit sign-extension. Therefore, it is necessary to set the sign-decompression portion to zero at all times by andi instruction.
A clipping procedure 211 is the same as disclosed in "IEEE MICRO Magazine", Vol.15, No. 6 (December 1995), in FIG. 6(b) at page 25, and it performs the clipping without branch by using the minimum instruction (min) and the maximum instruction (max) in addition to the addition instruction of the pixel value of the predicted image and the error value which are introduced for signal processing by he microprocessor V830. That is, one register having a smaller value is selected from the register r13 in which the pixel value of the generated picture is stored and the register r10 which loads the constant 255 at the initial setting 210 (216), and subsequently one register having a larger value is selected from the register r13 in which the pixel value of the generated picture is stored and the register r0 which holds zero at all times (217), whereby the clipping processing 211 of limiting the pixel value of the generated picture to a value between 0 and 255. The pixel value of the generated picture after the clipping processing 211 is finished is stored at the position which is indicated by the register r8 (219).
The procedure from step 213 to step 218 corresponds to the processing of one pixel shown in FIG. 9. Actually, the processing of desired numbered of pixels is continuously performed while renewing the pointer (220).
In the motion compensation procedure shown in FIG. 10, in addition to the addition instruction of the pixel value of the predicted picture and the error value, the minimum instruction and the maximum instruction are needed to perform the clipping processing, so that there is a problem that the operation amount needed for one operation of the motion compensation processing is increased. Since the motion compensation processing is performed every pixel, one operation of the processing is simple, but it must be accessed at an extremely large frequency, so that it occupies most of the operation amount of the whole MPEG video decompression processing. Accordingly, the increase of the number of instructions needed for one operation of the motion compensation processing, even though the access frequency is merely several operations, greatly reduces the MPEG video decompression performance.