1. Field of the Invention
The present invention generally relates to motion video compressing techniques and, more particularly, to a technique for predictive or interpolative motion compensation, a technique for detecting a motion vector which is used to the predictive or interpolative motion compensation, a technique for simplifying manipulations of the detection of the motion vector and a technique for selecting intra-frame transform coding or inter-frame predictive coding to optimize the compression.
2. Description of the Related Arts
(1) Motion Video Compression Techniques
As techniques for compressing a moving picture signal into a bitstream, predictive/interpolative motion compensation, orthogonal transformation, quantization, variable length coding, etc. are used.
For example, predictive/interpolative motion compensation, discrete cosine transformation (DCT), adaptive quantization and Huffman coding have been adopted by the MPEG standard. Here, "MPEG" stands for "Moving Picture Experts Group," which is the name of a committee on the standardization of image compression established under the International Standards Organization (ISO), and the MPEG-1 standard is provided in the "ISO/IEC 11172," while the MPEG-2 standard is provided in the "ISO/IEC 13818."
As prior arts related to the above fields, there have been techniques disclosed in the U.S. Pat. No. 5,231,484 (the Japanese Unexamined Patent Publication No. 252507/1993), the U.S. Pat. No. 5,293,229 (the Japanese Unexamined Patent Publication No. 276502/1993) and the U.S. Pat. No. 5,325,125 (the Japanese Unexamined Patent Publication No. 225284/1994), for example.
In the above predictive or interpolative motion compensation, the region (=the reference macroblock) which bears the closest resemblance to the current macroblock in the current frame is searched for within the reference frame preceded in time and/or the reference frame following in time, and each difference value between the each pixel value in the searched reference macroblock and the corresponding pixel value in the current macroblock is obtained. Here, a vector which specifies the position of the reference macroblock from the position of the current macroblock is called a "motion vector," and an action of searching for the position of the reference macroblock is called the "detection of a motion vector." The macroblock is a region composed of 16.times.16 pixels and the predictive or interpolative motion compensation is performed in a macroblock. The block is a region composed of 8.times.8 pixels and the DCT is performed in a block. Thus the outputs of the motion estimation and compensation for a macroblock are motion vectors and a motion-compensated difference macroblock. And the each difference block in the difference macroblock is coded by using the DCT, the quantization and the variable length coding.
When the compressed difference macroblock is expanded, the image data of the corresponding reference macroblock (=specified by the motion vector of the difference macroblock) is added to the expanded difference macroblock data obtained by variable length decoding, inverse quantization and inverse DCT, whereby the image data of the current macroblock is reproduced. Here, the reference image data of at least one frame preceding in time and at least one frame following in time have been stored in a frame memory.
In the above DCT, each block partitioned into 8.times.8 pixels is transformed into frequency terms ranging from low frequency terms to high frequency terms and converted to a coefficient matrix .vertline.Cij.vertline. composed of 8.times.8 coefficient (Cij)s. Hereinafter, suffixed "i" and "j" denote "i rows, j columns."
In the above quantization, each coefficient Cij of the 8.times.8 coefficient matrix .vertline.Cij.vertline. is divided by a certain divisor Qij {(quantizer scale q).times.(constant Kij proper to each coefficient Cij)} and the reminder is rounded off. Here, the constant Kij is given in a quantization matrix table. In the intra-macroblock, a quantization matrix table in which large values are provided to the coefficients of higher frequency terms and small values are provided to the coefficients of lower frequency terms is generally used, while in the inter-macroblock, all the constants Kij take the same value.
It is permitted by the MPEG standards to load the quantization matrix data Kij for intra-macroblock and inter-macroblock for each program or sequence. It is also provided for in the MPEG standards that the minimum requirement is to include at least one group of picture (GOP) in the sequence, and the quantization matrix data Kij can be changed in an IGOP as the minimum.
When the value of Kij and/or q increases, the coefficient data C'ij outputted from a quantization circuit contains more "0" and the compression rate rises. In the above adaptive quantization, a bit-rate of a bitstream being outputted from the variable length coder is monitored, and the above quantizer scale q is set so that the value of the bit-rate being monitored can meet the target value. That is, when the bit-rate being monitored is smaller than the target value, the quantizer scale q is controlled to be smaller, and when the bit-rate being monitored is larger than the target value, the quantizer scale q is controlled to be larger. Incidentally, an example of a quantization circuit and a bit-rate controller are illustrated in FIG. 3 as a quantizer 118 and a bit-rate controller 124, respectivery.
In the above Huffman coding, each code word is allocated according to the frequency of occurrence of each coefficient C'ij (=Cij.div.Qij(=Kij.times.q)) after quantization so that the code word can be shorter as the frequency of occurrence is higher.
(2) Detection of Motion Vector
As illustrated in FIG. 5, the motion vector is searched for from pixels within a region equivalent to a region of 8 pieces of 16.times.16 pixels centered around a region A0 (=a region of 16.times.16 pixels having the same coordinate position as that of the current macroblock which is a target of compression within a current frame) within a reference frame preceding in time and/or within a reference frame following in time. That is, within a region of 48.times.48 pixels centered around the region A0 is a target range of searching for the motion vector. When the motion vector is searched for beyond the above mentioned region, the large-scale circuit for the processing and the large-scale capacity for the memorizing is required.
As methods for detecting the motion vector of the current macroblock, in other words, as methods for searching for the region (=reference macroblock) bearing the closest resemblance to the current macroblock within the above mentioned searching region, the following are available.
*Full Searching (FS) Method
As illustrated in FIG. 5(a), the current macroblock is compared with the regions A1, A2, A3, . . . respectively within a specified searching range centered around the region A0 within the reference frame corresponding to the current macroblock, wherein the regions A2, A3, . . . have shifted from the preceding regions A1, A2, . . . , by one pixel starting from the upper left comer A1. This method requires the large-scale circuit for the processing and the large-scale capacity for the memorizing due to enormous number of regions to be compared but the precision of the motion vector detection is high.
*Logarithmic Searching (LS) Method
As illustrated in FIG. 5(b), comparison is made, for example, from the upper left.fwdarw.upper.fwdarw.upper right.fwdarw.left.fwdarw.center.fwdarw.right.fwdarw.under left.fwdarw.under.fwdarw.under right in this order within a specified searching range centered around the region A0 within the reference frame corresponding to the current macroblock. Incidentally, the order is an example, and the comparison may be done by another order. Then, the searching range is narrowed toward the region A3 which bears the closest resemblance to the current macroblock in the above 9 regions, and comparison is repeated in this way. As the narrowing of the searching range is logarithmic, this method is called "logarithmic searching method. "
*Telescopic Searching (TS) Method
As illustrated in FIG. 5(c), from the region A1 indicated by the precedingly detected motion vector (=the motion vector detected with respect to the corresponding macroblock in the preceding frame, the motion vector detected with respect to the preceding macroblock, etc.), the regions in the vicinity of the region A1 are searched. As the regions to be compared are limited, the circuit scale for the processing and the capacity for the memorizing can be reduced but the precision of the motion vector detection is slightly lower than the FS method.
Furthermore, as the methods for detecting the region that bears the closest resemblance to the current macroblock among the above mentioned regions A0, A1, A2, . . . , the following methods are available.
*Sum of the Absolute Values (or Square Values)
Firstly, the differential values between the pixel values of the current macroblock and the corresponding pixel values of each of the regions A0, A1, A2, . . . are obtained, respectivery. Then, the sum of the absolute values of each of the differential values or the sum of the square values of each of the differential values are calculated. And then, the region in which the sum is the smallest of all is detected as a reference macroblock.
*Number of Coincident Pixels
The number of pixels whose values are coincident between each of the regions A0, A1, A2, . . . to be compared and the current macroblock is calculated, and the region in which the number of such pixels is the largest is detected as a reference macroblock.
Incidentally, as prior arts related to the detection of the motion vector, there have been techniques disclosed in the Japanese Unexamined Patent Publications Nos. 199379/1982(the U.S. Pat. No. 4,460,923), 107785/1983(the U.S. Pat. No. 4,460,923), 101581/1983(the U.S. Pat. No. 4,460,923), 145777/1992, 79484/1992, 40687/1991,207790/1992, 234276/1992 and 40193/1992, and in the 78p-85p of ISO/IEC 11172-2, for example.
(3) Simplification of Circuit for Detecting Motion Vector
In the operation for detecting the motion vector, as each pixel value within each of a numerosity of regions of 16.times.16 pixels and each corresponding pixel value of the current macroblock is compared, respectivery, the large-scale circuit for the processing and the large-scale capacity for the memorizing is required. For this reason, it has been desired that the the circuit scale for the processing and the capacity for the memorizing should be reduced without lowering the precision of the motion vector detection, and for this purpose, binary conversion has been proposed.
The binary conversion is a method in which each pixel value (8 bits value) of the current macroblock and each pixel value (8 bits value) of the above searching range are binary-coded respectively and then compared with each other to detect the motion vector. As the binary-coding methods, there have been techniques disclosed in the Japanese Unexamined Patent Publications Nos. 71580/1987, 874/1992 and 10176/1992, for example. The binary-coding technique disclosed in the Japanese Unexamined Patent Publication No. 71580/1987 will now be outlined referring to FIG. 1.
For simplifying the description, one frame is supposed to be 12.times.12 pixels and one macroblock is supposed to be 4.times.4 pixels. It is a matter of course that the following description can also be applied to the actual one frame and actual macroblock (16.times.16 pixels) in the same way.
A frame memory 10 stores the digital image signal of the current frame of 12.times.12 pixels, while a frame memory 12 stores the digital image signal of the preceding frame of 12.times.12 pixels. The digital image data of the current frame stored in the frame memory 10 is partitioned into macroblocks of 4.times.4 pixels each by a block converter 14.
Each macroblock outputted from the block converter 14 is binary-coded by a binary-coder 16. As a threshold for the binary-coding, the average value of the pixel values (8 bits each) of the macroblock is adopted. In this way, as illustrated in FIG. 1(b), the each pixel represented as 8-bits-data (=256 tones) in the macroblock (=4.times.4 pixels) is converted to the each pixel represented as 1-bit-data (=2 tones=black or white) in the binary-converted macroblock (=4.times.4 pixels). On the other hand, the binary-coder 16 outputs the above threshold to a binary-coder 18 for each macroblock as a reference value.
The binary-coder 18 converts the preceding frame outputted from the frame memory 12 into the binary-converted preceding frame by using the reference value outputted from the binary-coder 16 as a threshold. In this way, as illustrated in FIG. 1(c), the each pixel represented as 8-bits-data (=256 tones) in the preceding frame (=12.times.12 pixels) is converted to the each pixel represented as 1-bit-data (=2 tones=black or white) in the binary-converted preceding frame (=12.times.12 pixels). Incidentally, the preceding frame is binary-coded for each macroblock.
Binary-converted each pixel of the macroblock of the current frame and binary-converted corresponding pixel of the each region which is divided from the binary-converted preceding frame in specified order are compared with each other and the coincided pixels are detected through a motion vector detector 20. And then, in the motion vector detector 20, the number of coincided pixels are counted for each region, and the region which has the largest number of coincided pixels is detected as a reference macroblock.