The subject matter of the present application relates generally to the field of low bit-rate video coding and more particularly to a method and apparatus for advantageously selecting video frames to be coded for improved coding quality.
Low bit-rate video coders are of great interest for a number of applications, including, for example, videophone, video-conferencing and multimedia applications. To achieve sufficiently low bit-rates, which for many of these applications must be as low as 16 Kbps (Kilobits per second) or even lower, both spatial and temporal sub-sampling is typically performed on the input signal prior to coding. Spatial sub-sampling is typically achieved by reducing the image size to QCIF (176xc3x97144 pixels) format (familiar to those skilled in the art). Temporal sub-sampling typically involves reducing the effective frame rate from 30 fps (frames per second) to 5-10 fps, or sometimes even below 5 fps. This effective frame rate reduction is typically obtained in conventional low bit-rate coders by selecting for encoding only every M""th frame from the input frame sequence (corresponding to a uniform frame selection and a temporal sub-sampling by a factor of M). Thus, for a coder which operates at 5 fps, for example, every sixth frame would be selected (i.e., M=6). At the decoder, the missing frames are usually filled-in by replicating the last decoded frame to obtain the nominal display frame-rate of 30 fps. Other techniques, such as, for example, motion-compensated frame interpolation (familiar to those skilled in the art) may also be applied at the decoder to supply the missing (i.e., uncoded) frames.
Even at such reduced spatial and temporal resolutions, however, the coding performance of standardized fixed rate coders, such as, for example, those in accordance with the xe2x80x9cH.263xe2x80x9d standard (hereinafter xe2x80x9cH.263xe2x80x9d), is often unsatisfactory during those portions of the video sequence in which a substantial change occurs between consecutively coded frames. (xe2x80x9cH.263xe2x80x9d is well known to those skilled in the art.) Such changes may be due either to a sudden increase in the motion of objects in the scene or to other fast scene changes.
The use of buffering, along with appropriate buffer control, has been used to address this problem to some degree. However, in real time applications such as videophone or video conferencing applications, the buffer size must necessarily be quite small, in order to avoid unacceptable delay. Thus, in a fixed rate regime, a sizable increase in motion or other fast changes in the scene results in a corresponding increase in the quantization step-size (i.e., a reduced coding resolution) in order to meet the fixed rate constraint, since the only way to code the substantial image change with the available number of bits is to code the data more coarsely. This reduced resolution typically causes an increase in blockiness (in block-based coders like xe2x80x9cH.263xe2x80x9d) and other undesired effects, such as, for example, high frequency artifacts around edges, sometimes referred to as xe2x80x9cmosquito noise.xe2x80x9d A partial remedy which has been applied to these degradations is to use postprocessing of the decoded video signal. Unfortunately, such postprocessing techniques often tend to introduce distortions, such as blurring, if heavily applied. Therefore, it may be more advantageous to improve the performance of low bit rate coders, and in particular the standard xe2x80x9cH.263xe2x80x9d coder, by appropriate processing of the video sequence prior to coding.
Prior art coding systems have in some cases applied such preprocessing techniques, but these preprocessors have addressed mainly issues of noise reduction and filtering prior to sub-sampling so as to avoid aliasing effects. Preprocessing has not heretofore been applied to the xe2x80x9ccleanxe2x80x9d sub-sampled signal. Specifically, it would be advantageous to modify (or, equivalently, condition) the input signal such that the preprocessed, encoded signal will have fewer artifacts than would be introduced by coding the original input signal alone. Postprocessing may still be advantageously applied, but may be more moderately applied than might be otherwise needed. Such a combination of pre- and post-processing may therefore provide a substantial overall improvement in quality.
In particular, and as is familiar to those skilled in the art, hybrid coders such as xe2x80x9cH.263xe2x80x9d usually encode the displaced frame difference (DFD)xe2x80x94i.e., the difference between the current frame and a predicted frame, where the predicted frame is obtained using motion compensation on the previously coded frame. A direct consequence of this is that the coder performance will improve if the variance of the DFD is reduced. In other words, increasing the predictability of each frame to be coded from its motion-compensated predecessor necessarily improves overall coder performance. Although this result has been achieved to some extent in prior art coders by applying simple spatial filtering, such filtering techniques introduce significant perceived degradation of the image. It would be advantageous, therefore, to increase the predictability of the frames while causing minimal perceived degradation.
The present invention provides a method and apparatus for advantageously selecting video frames to be coded in order to improve the coding quality of a low bit-rate coder. In particular, it has been recognized that prior art coding systems"" method of temporal sub-sampling (i.e., selecting a set of frames to be coded from the complete incoming sequence of frames) may be modified so that the frames which are to be coded are advantageously selected based upon a coding criterion, such as, for example, prediction gain (i.e., reduction in DFD variance).
Specifically, in accordance with the instant invention, a video signal comprising a sequence of video frames is coded, the sequence of video frames comprising a sequence of subsequences of said video frames, by (i) determining a coding quality measure for one or more of the video frames comprised in one of said subsequences of said video frames; (ii) selecting a particular one of the video frames comprised in said one of said subsequences of said video frames, the selection based on the coding quality measure therefor; and (iii) coding the selected video frame as representative of said one of said subsequences of said video frames.
Illustrative embodiments of the instant invention recognize that a noticeable reduction in conventional coder performance occurs whenever there is a fast change between consecutive coded frames (e.g., a sudden head tilt or hand waving in a typical videophone sequence). Therefore, in accordance with an illustrative embodiment of the present invention, a larger number of frames are advantageously selected during such short periods of fast change, and correspondingly fewer frames are selected during the other periods, while thereby keeping the overall apparent frame-rate fixed. In accordance with one illustrative embodiment of the present invention, the fixed frame-rate may be maintained by grouping the incoming sequence of frames into sequential groups of M consecutive frames, and then selecting exactly one frame per every M input frames, while permitting the selected frame to be at any advantageously selected location within the group of M frames. (Such a group of M sequential frames will be referred to herein as a xe2x80x9csuperframexe2x80x9d). In this manner, two consecutively coded frames may turn out to be as close as one frame apart in the original 30 fps input sequence, or may turn out to be up to 2Mxe2x88x921 frames apart. Thus, non-uniform frame selection is achieved, even though exactly one frame is actually coded within each superframe. Moreover, by basing the specific frame selection on an appropriate coding criterion (e.g., prediction gain), a substantial improvement in coder performance may be achieved for those critical portions of the video sequence during which a conventional coder""s performance may be drastically reduced, without changing the apparent frame-rate.