The present invention relates to video coding method and apparatus and motion vector estimator for use in recording or transmitting a video signal.
In general, a current frame of a moving picture has high correlation with the previous frame thereof unless scenes are changed between them. Accordingly, in coding a moving picture, a motion-compensated inter-frame coding technique is ordinarily employed, i.e., a current frame is predicted and coded using a pair of frames preceding and succeeding the current frame.
Since inter-frame prediction always requires information about other frames, it is impossible to randomly access a desired frame. Thus, an intra-frame coding technique, i.e., coding a current frame using only the information about the frame itself, is also employed periodically. For example, according to the MEG (moving picture experts group) standards, a unit of intra-frame coding is called a xe2x80x9cgroup of pictures (GOP)xe2x80x9d.
In coding a moving picture, a target frame is divided into a plurality of blocks, each consisting of a multiplicity of pixels neighboring each other, and each of these blocks divided is coded independently. That is to say, coding is performed on a block-by-block basis. For instance, in performing predictive coding, a block with the highest correlation with the target block is extracted as a predicted block from a reference frame preceding or succeeding the target frame. A predicted error block, which represents a difference between the target block and the predicted block, is quantizes and coded. When coding is performed using a reference frame preceding a target frame, the coding technique is called xe2x80x9cforward predictive codingxe2x80x9d. Conversely, when coding is performed using a reference frame succeeding the target frame, the coding technique is called xe2x80x9cbackward predictive codingxe2x80x9d. And when coding is performed using a frame representing an average between the pair of frames preceding and succeeding the target frame as a reference frame, the coding technique is called xe2x80x9cbidirectional predictive codingxe2x80x9d.
Since a predicted block should preferably be most similar to the target block, a block with the highest correlation with the target block is extracted as the predicted block from the blocks located within a search range in the reference frame. Specifically, differential blocks are obtained based on respective differences in luminance signal level between the target block and candidate blocks within the search range in the reference frame. By calculating the sum of absolute or squared pixel values within each of these differential blocks, one of the candidate blocks with a differential block having the smallest sum of absolute or squared pixel values is extracted as the predicted block.
In a moving picture, however, the location of the predicted block within the reference frame is different from that of the target block within the target frame. The direction and quantity representing this position al difference is called a xe2x80x9cmotion vectorxe2x80x9d, which is also coded along with the predicted error block and used in decoding. Such a technique is called xe2x80x9cmotion compensationxe2x80x9d.
According to the MPEG standards, pictures are classified based on the coding mode thereof into the three types of: I-pictures, P-pictures and B-pictures. An I-picture is composed of nothing but intra blocks, in which only the information within the frame is coded without performing prediction. A P-picture is composed of forward-predicted blocks and intra blocks. And a B-picture is composed of forward-predicted blocks, backward-predicted blocks, bidirectional-predicted blocks and intra blocks.
Also, when inter-frame prediction is performed, frame prediction can be adaptively switched into field prediction, and vice versa, based on the state of the picture to increase the predictive efficiency. For example, an interlaced picture may sometimes be coded more efficiently by the field prediction technique.
As can be seen, a plurality of coding modes coexist according to the MPEG standards.
FIG. 16 is a block diagram illustrating a configuration of a conventional video coder. This video coder is adapted to select one of a plurality of coding modes and to code a target block based on the coding mode selected. In the following example, selectable coding modes are supposed to include forward prediction mode and intra-frame coding mode.
As shown in FIG. 16, the video coder includes: block divider 10; encoding section 40; decoding section 50; frame memory section 60; predicted block generator 70; and coding mode determining section 1600.
The coding mode determining section 1600 includes: a sum-of-squared-differences calculator 1601; a variance calculator 204; and a coding mode determiner 205.
The predicted block generator 70 receives a reference frame, which has been output from the frame memory section 60, and a target block Sk, which has been output from the block divider 10. The generator 70 extracts a block with the highest correlation with the target block Sk from the reference frame, thereby outputting the extracted block as a forward-predicted block Pk.
In the coding mode determining section 1600, the variance calculator 204 calculates a variance Vk of the target block Sk and outputs the variance to the coding mode determiner 205. Receiving the target block Sk and forward-predicted block Pk, The sum-of-squared-differences calculator 1601 obtains a predicted error Epk, that is, a sum of absolute or squared differences between the blocks of these two types, and outputs the predicted error Epk to the coding mode determiner 205.
Comparing the variance Vk to the predicted error Epk, the coding mode determiner 205 selects one of the coding modes allowed by the picture type specified, and outputs the coding mode selected to the encoding and decoding sections 40 and 50. For example, if an I-picture has been specified, then the determiner 205 determines the coding mode as intra-frame coding. FIG. 17 illustrates how the coding mode is determined by a test model (TM) technique according to the MPEG standards. If a P-picture, requiring prediction, has been specified, then the coding mode is determined as shown in FIG. 17.
In this manner, the conventional video coder obtains the variance of the target block and the predicted error of the predicted block, and selects one of a plurality of coding modes based on these values.
The method for compensating for a motion based on the magnitude of a predicted error in luminance signal level between blocks supposes that brightness conditions are the same between a target frame and a reference frame. In other words, this method supposes that an actually associated object on these frames is presented at substantially the same luminance signal level. However, in several types of pictures, such as a fade-in picture gradually brightens on the whole screen, a fade-out picture gradually darkening and a flash picture on which the brightness of the frame changes instantaneously, the luminance changes substantially uniformly over the entire screen. Accordingly, even if an object on a target frame is actually associated with an object on the reference frame, these objects are presented at mutually different luminance signal levels. Thus, in such a situation, the motion cannot be compensated for properly and an appropriate coding mode cannot be selected, either.
Next, the features of fade-in and fade-out pictures (in this specification, these pictures will be collectively referred to as xe2x80x9cfading picturesxe2x80x9d) will be described in greater detail. FIG. 18 illustrates a variation in luminance signal level between a pixel located on a line within a frame (i.e., the 65th frame in the illustrated example) of a quasi-still fade-out picture with almost no motion and a corresponding pixel located on the same line within a frame (i.e., the 70th frame in the illustrated example) appearing later than the former frame. As shown in FIG. 18, although almost no motion happens in the fade-out picture, the luminance signal level greatly changes with time. That is to say, the value of a DC component (i.e., an average) considerably changes between the frames due to the fading effects. Accordingly, even if a target block is actually associated with a block in a reference frame, the sum of absolute or squared differences in the predicted error block amounts to a non-negligible value. Thus, the associated block cannot be estimated correctly or the motion cannot be compensated for precisely.
FIGS. 19A, 19B and 19C illustrate a problem arising in a fading picture. In these drawings, the hatched portion illustrates a gradually decreasing luminance signal level. FIG. 19A illustrates a forward reference frame and FIG. 19B illustrates a target frame. That is to say, FIGS. 19A and 19B illustrate respective frames of a fade-out picture with almost no motion, of which the luminance signal level gradually decreases.
In the illustrated example, the target block is the block Sk shown in FIG. 19B and a predicted block is searched for within the search range SA shown in FIG. 19A. According to the conventional motion compensation technique, a predicted block is estimated based on a predicted error in luminance signal level between the block Sk and the predicted block. Thus, a block Bk shown in FIG. 19A, which has a luminance signal level closest to that of the block Sk, is estimated as the forward-predicted block. Suppose a forward prediction mode is selected after that, because an estimate, obtained by adding together the absolute or squared differences in the predicted error block, is small. In such a case, coding is performed using the block Bk as the forward-predicted block. However, on a picture produced by decoding the resultant coded data, noise is superimposed at a location corresponding to the block Sk as shown in FIG. 19C.
As can be seen, the motion of a fading picture cannot be precisely compensated for according to the conventional method, and the quality of a decoded picture adversely deteriorates.
To avoid such a problem, the motion compensation may be performed after the variation in luminance signal level between a target frame and a reference frame thereof due to fading has been eliminated. In other words, the motion compensation may be carried out using only AC components after the variation in DC components has been eliminated.
For example, as disclosed in Japanese Laid-Open Publication No. 8-98187, the variation in DC components may be eliminated by equalizing an average of pixel values in a block within the search range of a reference frame for prediction with that of pixel values in a target block.
In this manner, according to the conventional motion compensation technique, an average of pixel values in a block within the search range of a reference frame and that of pixel values in a target block are obtained on a block-by-block basis. And the pixel values are corrected so as to equalize these averages with each other. However, since the computational cost required for motion compensation is already very high, it is difficult to perform such additional processing. In addition, to store the average pixel values for respective blocks within the reference frame, the same number of memories as that of the blocks are needed.
Moreover, when a coding mode is selected, processing of correcting the pixel values in such a manner as to eliminate the fading effects is not carried out according to the conventional method. Accordingly, if the luminance signal level has changed over the entire screen due to fade-in or fade-out, the sum of absolute or squared differences also changes in the predicted error block, and an inappropriate coding mode may be selected as a result.
An object of the present invention is providing video coding method and apparatus and motion vector estimator, which make it possible to select an optimal one from several coding modes even if a luminance signal level has changed over the entire screen due to fading, for example, and to perform optimized motion compensation without increasing the computational cost so much.
A video coding method according to the present invention is adapted to predictively code each target block within a target frame relative to a reference frame. The method includes the steps of: a) calculating an estimate of the target block based on pixel values within the target block; b) calculating respective correction values for the target block and a predicted block associated with the target block, the predicted block being generated from the reference frame by motion compensation; c) correcting the pixel values within the target and predicted blocks using the respective correction values, and calculating a predicted error based on a difference between each said pixel value within the corrected target block and an associated one of the pixel values within the corrected predicted block; d) determining a coding mode based on the estimate of the target block and the predicted error; and e) coding the target block in accordance with the coding mode determined.
Another video coding method according to the present invention is also adapted to predictively code each target block within a target frame relative to a reference frame. The method includes the steps of: a) calculating an estimate of the target block based on pixel values within the target block; b) obtaining a predicted error block composed of a plurality of pixels representing differences between the pixel values within the target block and pixel values within a predicted block associated with the target block, and calculating an estimate of the predicted error block based on the pixel values within the predicted error block, the predicted block being generated from the reference frame by motion compensation; c) determining a coding mode based on the respective estimates of the target block and the predicted error block; and d) coding the target block in accordance with the coding mode determined.
Still another video coding method according to the present invention is also adapted to predictively code each target block within a target frame relative to a reference frame. The method includes the steps of: a) calculating a correction value for the target block; b) correcting pixel values within the target block using the correction value, and estimating a motion vector of the corrected target block relative to the reference frame; and c) coding the target block in accordance with the motion vector estimated.
A video coding apparatus according to the present invention is adapted to predictively code each target block within a target frame relative to a reference frame. The apparatus includes: first calculating means for calculating an estimate of the target block based on pixel values within the target block; correction value calculating means for calculating respective correction values for the target block and a predicted block associated with the target block, the predicted block being generated from the reference frame by motion compensation; second calculating means for correcting the pixel values within the target and predicted blocks using the respective correction values, and calculating a predicted error based on a difference between each said pixel value within the corrected target block and an associated one of the pixel values within the corrected predicted block; means for determining a coding mode based on the estimate of the target block and the predicted error; and means for coding the target block in accordance with the coding mode determined.
Another video coding apparatus according to the present invention is also adapted to predictively code each target block within a target frame relative to a reference frame. The apparatus includes: first calculating means for calculating an estimate of the target block based on pixel values within the target block; second calculating means for obtaining a predicted error block composed of a plurality of pixels representing differences between the pixel values within the target block and pixel values within a predicted block associated with the target block, and calculating an estimate of the predicted error block based on the pixel values within the predicted error block, the predicted block being generated from the reference frame by motion compensation; means for determining a coding mode based on the respective estimates of the target block and the predicted error block; and means for coding the target block in accordance with the coding mode determined.
Still another video coding apparatus according to the present invention is also adapted to predictively code each target block within a target frame relative to a reference frame. The apparatus includes: means for calculating a correction value for the target block; a motion estimator for correcting pixel values within the target block using the correction value, and estimating a motion vector of the corrected target block relative to the reference frame; and means for coding the target block in accordance with the motion vector estimated.
An apparatus for estimating a motion vector for each target block within a target frame relative to a reference frame according to the present invention includes: means for calculating a correction value for the target block; and a motion estimator for correcting pixel values within the target block using the correction value, and estimating a motion vector of the corrected target block relative to the reference frame.
According to the present invention, even if a luminance signal level has changed with time over the entire screen due to fading, for example, the effects of such a change in luminance signal level, which is caused between a target frame and a reference frame used for motion compensation, can be eliminated. Thus, an optimum motion vector can be estimated, and an optimum coding mode can be selected without selecting a prediction mode that might result in erroneous motion compensation. Also, even when a video coder according to the present invention includes only the motion compensating section or coding mode determining section, the video coder still can code a fading picture in an optimum coding mode.