1. Field of the Invention
The present invention relates to an apparatus for detecting motion vectors employed for motion compensation of moving images.
2. Description of the Background Art
In order to transmit or store image signals having an enormous quantity of data, a data compression technique of reducing the data quantity is indispensable. Image data has considerable redundancy resulting from correlation between neighboring pixels, human perception and the like. A data compression technique suppressing the data redundancy and reducing the quantity of transmission data is called highly efficient coding. Systems of such highly efficient coding include a predictive coding system. In the predictive coding system, the following processing is executed:
Predictive errors which are the differences between respective pixel data of a current screen (frame or field) to be currently coded and respective pixel data on the same positions of a preceding reference screen are computed, and employed for subsequent coding. In this method, coding can be performed in high efficiency for images having small motions due to remarkable correlationship between the screens. As to images having large motions, however, the errors are increased due to small correlationship between the screens, to disadvantageously increase the quantity of transmission data.
Motion-compensated interframe (or interfield) predictive coding is employed to solve the aforementioned problem. In this method, the following processing is executed: First, motion vectors are computed using pixel data of a current screen (frame or field) and a preceding screen before computing predictive errors. Predictive images of the preceding screen are moved in accordance with the computed motion vectors. Image data of the preceding screen on positions displaced by the motion vectors are regarded as reference pixels, so that the values thereof are employed as predictive values. Then, predictive errors between pixels of the moved preceding screen and the current screen are computed for transmission with the motion vectors.
FIG. 83 is a block diagram schematically showing the overall structure of an encoder for coding image data in accordance with a conventional motion-compensated predictive coding method. Referring to FIG. 83, the encoder includes a preprocessing circuit 910 for executing prescribed preprocessing on an inputted image signal, a source coding circuit 912 for executing elimination of redundancy and quantization on the signal preprocessed by the preprocessing circuit 910, and a video multiplex coding circuit 914 for coding the signal from the source coding circuit 912 in accordance with a prescribed format and multiplexing the same to a code string of a predetermined data structure.
The preprocessing circuit 910 converts the input image signal to that of a common intermediate format (CIF) through time and space filters, and executes filtering for noise elimination.
The source coding circuit 912 performs an orthogonal transformation such as discrete cosine transformation (DCT), for example, on the supplied signal, performs motion compensation on the input signal, and quantizes orthogonal-transformed image data.
The video multiplex coding circuit 914 performs two-dimensional variable-length coding on the supplied image signal, performs variable-length coding on various attributes, such as motion vectors, of blocks which are data processing units, and thereafter multiplexes the same to a code string of a predetermined data structure.
The encoder further includes a transmission buffer 916 for buffering the image data from the video multiplex coding circuit 914, and a transmission coding circuit 918 for adapting the image data from the transmission buffer 916 to a transmission channel. The transmission buffer 169 smoothes information generation rate to a constant rate. The transmission coding circuit 918 executes addition of error correction bits, voice signal data and the like.
FIG. 84 illustrates an exemplary structure of the source coding circuit 912 shown in FIG. 83. Referring to FIG. 84, the source coding circuit 912 includes a motion compensation predictor 920 for detecting motion vectors with respect to the input image signal supplied from the preprocessing circuit 910 and generating a reference image which is motion-compensated in accordance with the motion vectors, a loop filter 922 for filtering reference image pixel data from the motion compensation predictor 920, a subtracter 924 for obtaining the difference between an image signal outputted from the loop filter 922 and the input image signal, an orthogonal transformer 926 for orthogonally transforming an output of the subtracter 924, and a quantizer 928 for quantizing the data orthogonally transformed by the orthogonal transformer 926.
The motion compensation predictor 920 includes a frame memory storing pixel data of a preceding frame, for detecting the motion vectors and generating the motion-compensated reference image pixel data in accordance with the input image signal data and the pixel data in the frame memory (in case of a frame screen). The loop filter 922 is employed to improve the picture quality.
The orthogonal transformer 926 performs orthogonal transformation such as DCT in a unit of a block of a prescribed size (8 by 8 pixels in general). The quantizer 928 quantizes the orthogonally transformed pixel data in accordance with a previously prepared quantization table.
The motion compensation predictor 920 and the subtracter 924 execute motion-compensated interframe (or interfield) prediction, for eliminating temporal redundancy in a moving image signal. The orthogonal transformer 926 eliminates spatial redundancy in the moving image signal by orthogonal transformation.
The source coding circuit 912 further includes an inverse quantizer 930 for transforming the data quantized in the quantizer 928 to the signal state before the quantization, an inverse orthogonal transformer 932 for performing inverse orthogonal transformation on an output of the inverse quantizer 930, and an adder 934 for adding up outputs of the loop filter 922 and the inverse orthogonal transformer 932. The inverse quantizer 930 and the inverse orthogonal transformer 932 generate an image employed for interframe (or interfield) prediction for the next frame (or field). The generated image data is written in the frame memory included in the motion compensation predictor 920.
The adder 934 adds the image signal (interframe or interfield difference data) to the output of the loop filter 922, thereby recovering image data of the current frame (or current field). The video multiplex coding circuit 914 converts the output image data from the quantizer 928 and the motion vectors from the motion compensation predictor 920 to variable-length codes and transmits the same.
Computing of a motion vector is now described as to interframe predictive coding. In general, block matching is employed for motion vector computing.
Consider that an image A in a (m-1)-th frame moves to a position A' in an m-th frame, as shown in FIG. 85A. In block matching, the screen (one frame in this case) is split into blocks of P by Q pixels (P=Q in general). A block most approximate to that of interest in the current frame is found out from a preceding frame. The displacement between the block of interest and the most approximate block in the preceding frame is called a motion vector. The motion vector is now described in more detail.
As shown in FIG. 85B, it is assumed that the m-th frame is to be coded. This frame is split into blocks each of N by N pixels (P=Q=N in general). It is assumed that Xm(Nk, N1) represents the value of pixel data on the upper left pixel position (Nk, N1) of the block of N by N pixels in the m-th frame. The absolute value of the difference between the data of the corresponding pixels in a block of the preceding frame ((m-1)-th frame) whose pixel position is displaced by a vector (i, j) and in the block in the current frame (m-th frame) is obtained. Then, the displacement vector (i, j) indicating the displacement is changed to various values, and absolute differential sum is obtained for the respective values. The absolute differential sums are generally called evaluation values. A displacement vector (i, j) indicating the position providing the minimum absolute differential sum is defined as a motion vector.
It is necessary to transmit a motion vector per pixel block. If the block size is reduced, the transmission information content is increased to disable effective data compression. If the block size is increased, on the other hand, it is difficult to perform effective motion detection. In general, therefore, the block size is set as 16 by 16 pixels, with a motion vector search range (maximum change width of i, j) of -16 to +16. Motion vector computing by block matching is further specifically described.
FIG. 86 illustrates a specific method of computing a motion vector by block matching. Consider an image 950 consisting of 352 dots (pixels) by 288 lines, as shown in FIG. 86. The image 950 is split into a plurality of blocks each consisting of 16 by 16 pixels. Motion vector detection is executed in the units of the blocks. Around a block 954 in a preceding frame which is on the same position as a block (hereinafter referred to as a template block) 952 subjected to the motion vector detection, a block 956 which is larger by .+-.16 pixels in the horizontal and vertical directions on the screen is regarded as a search block (hereinafter referred to as a search area). A motion vector search for the template block 952 is executed in the search area 956. The motion vector search method in accordance with block matching includes the following steps:
A block (shown by a vector (i, j) in FIG. 86) having a displacement corresponding to the motion vector is obtained. Evaluation value such as absolute differential sum (or square differential sum) of the respective pixels of the obtained block and pixels on the corresponding positions of the template block 952 is obtained.
The aforementioned operation is executed on all displacements (-16, -16) to (+16, +16) of the vector (i, j). After the evaluation values are obtained for all predictive image blocks (all image blocks in the search area 956), a predictive image block having the minimum evaluation value is detected. A vector going from the block (the block 954 shown by a vector (0, 0) in FIG. 86) of the same position (hereinafter referred to as a true back position) as the template block 952 to the predictive image block having the minimum evaluation value is decided as the motion vector for the template block 952.
FIG. 87 schematically illustrates the structure (syntax) of image data coded by the video multiplex coding circuit 914 shown in FIG. 83. This video multiplex coding circuit 914 multiplexes data supplied from the source coding circuit 912 into a bit stream (plural bit width) by variable-length coding, and transmits the same.
Referring to FIG. 87, the bit stream is split into a plurality of layers, i.e., a sequence layer, a GOP (group of picture) layer, a picture layer, a slice layer, a macro block layer and a block layer in order from the uppermost layer.
The block layer is formed by a block 1100 including an area 1100a mainly storing DCT coefficient data and an area 1100b storing an end of block (EOB) data indicating the end of the block 1100. The area 1100a stores the DCT coefficient data of pixels of eight rows by eight columns serving as a unit of DCT processing.
A macro block 1110 includes a prescribed number of (six) blocks 1100. The macro block layer on the bit stream includes the macro block 1110 formed by the blocks 1100 and a macro block header 1115 storing attributes of data and the motion vector of the macro block 1110 in variable-length codes.
The slice layer includes a slice 1120 formed by one or a plurality of macro blocks 1110 concatenated in image scan order. A slice header 1125 storing information indicating the on-screen vertical position of the slice 1120 and that such as a start code having a prescribed pattern indicating the beginning of this slice 1120 is provided at the head of the slice 1120.
The picture layer includes a picture (image) 1130 formed by a plurality of slices 1120. A picture header 1135 storing information indicating the type (I picture, P picture or the like) of the picture 1130 and that such as a start code indicating the beginning of the picture 1130 in variable-length code words. The picture 1130 corresponds to a single image, and is formed by at least one or a plurality of slices 1120. This picture 1130 is one of the following three types of pictures:
(a) I picture: This is an image coded with only information closed in the image. Namely, pixel data of the I picture is coded with no difference computing. PA1 (b) P picture: This is an image subjected to interframe or interfield predictive coding. A predictive image (reference image) employed for the P picture is an already coded I or P picture temporally preceding in input order. In general, it is possible to select a more efficient one of a method of coding the difference between the P picture and the motion-compensated predictive image or a method of coding the P picture with no difference computing in the units of macro blocks. PA1 (c) B picture (bidirectional predictive coded image): A predictive image employed for the B picture is selected from (i) an already decoded I or P picture temporally preceding the B picture, (ii) an already decoded I or P picture temporally subsequent to the B picture, and (iii) an interpolated image produced by the pictures (i) and (ii). It is possible to select the most efficient one of methods of coding differences between the B pictures and the three types of coding of the differences after motion compensation and coding with no difference computing. PA1 (a) A field image is split into a plurality of blocks each of P by Q pixels, and a single motion vector is detected (a single predictive image is generated) for each block. PA1 (b) Each block is further split into two blocks along the vertical direction on the screen, and a single motion vector is detected for each of the two blocks. Therefore, motion vectors for the upper and lower blocks are detected (two predictive images are generated) for each block of P by Q pixels. PA1 (a) A frame image is split into a plurality of blocks each of P by Q pixels, and a single motion vector is detected (a single predictive image is generated) for each block. PA1 (b) Each block of P by Q pixels is split into two pixel groups of those belonging to the same fields, i.e., even and odd fields respectively, and a single motion vector is detected for each pixel group. Therefore, a motion vector for the even field pixel group and that for the odd field pixel group are detected for each block of P by Q pixels (two predictive images are generated for a single block).
The GOP layer includes a GOP 1140 including a plurality of pictures 1130. The pictures 1130 included in the GOP 1140 include at least one I picture and zero or a plurality of P or B pictures. A GOP header 1145 storing a GOP start code and information such as a flag indicating that this GOP 1140 requires no reference from image data of a preceding GOP is arranged at the head of the GOP 1140.
The sequence layer includes a sequence 1150 formed by one or a plurality of GOPs 1140 or one or a plurality of pictures 1130. A sequence header 1155 storing information such as the format of the screen is arranged at the head of the sequence 1150. The sequence header 1155 can be arranged at the head of all GOPs 1140 included in the sequence 1150. This sequence header 1155 stores information such as a start code having a prescribed pattern indicating the beginning of the sequence 1150, horizontal and vertical sizes of the image(s), the picture rate (image display speed), the bit rate and its content and the like.
FIG. 88 illustrates an exemplary structure of the macro block header 1115 shown in FIG. 87. Referring to FIG. 88, the macro block header 1155 includes a macro block address area 1115a storing information (macro block address) indicating the position of the macro block on the screen and the number (macro block address increment) of macro blocks to be skipped, an area 1115b storing a macro block type indicating the method of processing the macro block, an area 1115c storing the motion vector of the macro block, and a CBP (coded block pattern) area 1115d storing a CBP indicating whether or not each block of the macro block other than an I picture includes DCT coefficient data.
The macro blocks skipped by the macro block address increment are those having no DCT coefficient codes (all DCT coefficients are zero) among macro blocks subjected to no motion compensation. The macro block type stored in the area 1115b includes information as to whether or not the macro block is subjected to interframe/interfield predictive coding, whether or not the same is motion-compensated and the like.
The motion vector area 1115c stores a motion vector for motion compensation prediction. In case of an I picture, the motion vector area 1115c stores no motion vector. In a P picture, it is possible to employ a motion vector in accordance with its predictive system (an odd or even field predictive coding system in case of frame predictive coding). Similarly, different motion vectors are employed in accordance with predictive systems for B pictures. Therefore, the bit width of the motion vector storage area 1115c is varied with the macro block. The CBP area 1115d indicates whether or not each block (the block 1100 in FIG. 87) includes DCT coefficient data. Therefore, a block having the information stored in the CBP area 1115d indicating that no DCT coefficient data is included is not present in data transmission (not transmitted).
All information included in the areas 1115a, 1115b, 1115c and 1115d of the macro block header 1115 is expressed in variable-length code words (variable-length symbols). Therefore, the time required for analyzing all information of the macro block header 1115 is varied with the attributes (the processing method, the number of motion vectors and the like) of the macro block. It is possible to decide what processing is performed for the macro block following the macro block header 1115 by analyzing the information of the macro block header 1115. Therefore, it is preferable to minimize the quantity of the data included in the macro block header 1115, in order to perform decoding at a high speed.
As shown in FIG. 87, a single macro block header 1115 is transmitted with respect to each macro block. If the data quantity of the macro block header 1115 can be reduced, therefore, necessary image data can be transferred at a high speed with addition of necessary codes for improving the picture quality.
On the other hand, some systems are proposed for predictive image (reference image) detection in the motion-compensated interframe (or interfield) predictive coding. In order to attain excellent coding efficiency (to reduce the quantity of coded data), it is necessary to perform motion detection in accordance with a plurality of predictive image detection systems (predictive coding systems), thereafter select the optimum predictive image detection system, and detect motion vectors in accordance with the optimum predictive image detection system. Fields or a frame can be employed as a unit forming a screen. A single frame is formed by two fields (even and odd fields). For example, there are the following predictive image detection systems for the respective cases:
(A) In case of coding pixel data in units of fields:
(B) In case of coding pixel data in units of frames:
As a structure for executing the plurality of predictive coding systems (predictive image detection systems), it is desirable to efficiently code the detected motion vectors in any predictive image detection system, in order to attain an effect of improving the coding efficiency. In relation to employment of such a plurality of predictive image detection systems (predictive coding systems), however, no consideration is made on a structure for reducing the amount of codes of motion vectors subjected to variable-length coding.