Video encoding algorithms are typically constrained in the total bitrate allowed (as is the case for variable bitrate video) or in the average bitrate allowed (constant bitrate video) for encoding the video stream. Thus, a video encoder cannot use a large number of bits (i.e. appreciably larger than the average bitrate) to encode each of a long sequence of successive frames. For example, for the case of constant bitrate video, a finite buffer, present at the decoder, is used to store encoded frames before they are displayed. In this case, the maximum number of bits that can be used to encode the current frame is bounded. This is because if a frame uses too many bits the decoder buffers can underflow, leading to a situation where the decoder has to delay or drop future frames.
The encoder controls the number of bits used to encode a frame by appropriately selecting encoding parameters such as the quantization scale. If the number of bits available for encoding the current frame is low, the encoder uses a high quantization scale to reduce the bitrate used for encoding the frame. However, if too high a quantization scale is used to encode a frame unnatural artifacts appear when the frame is reconstructed at the decoder. Depending on the magnitude of the quantization scale used, these artifacts may cause an appreciable loss in the perceived quality of the video stream.
Dynamically reduced resolution can be used, as an alternative to using a high quantization scale, to lower the number of bits used to encode a frame. Reduction in the resolution of a frame prior to encoding (termed reduced resolution mode encoding) allows the frame to be encoded in a lesser number of bits as compared to the original. The MPEG-4 video standard, for example, provides a reduced resolution mode, which can be used to encode video frames at a low bitrate.
Encoding at a reduced resolution is preferable to encoding at a very high quantization scale, from the point of view of perceptual quality of the reconstructed video frame. The reason being is that encoding at a reduced resolution causes uniform blurring as opposed to the characteristic blocky artifacts caused by encoding at too high a quantization scale. However, deciding when to encode at reduced resolution is not straightforward. Often, encoding at moderately high quantization scales may produce reconstructions of better quality than if reduced resolution were used. This is especially true when the spatial and temporal complexity of the frame is not high enough to mask the effects of reduced resolution. Thus, it is inadvisable to encode frame sequences with little motion at low resolution.
Another significant issue is that of the temporal distortion caused by encoding successive frames at different resolutions. Repeatedly switching resolution modes is inadvisable, it may be better to use the same resolution mode as preceding frames even if it provides inferior reconstruction for the frame on a stand-alone basis. It is therefore imperative that any resolution selection method ensures that reduced resolution is only used when it can be suitably masked and that resolution modes do not switch repeatedly in a short duration.
An example of an encoding method aimed at selection of a judicious resolution mode for encoding a particular image in a sequence of images is disclosed in U.S. Pat. No. 5,262,855. In this prior-art system (FIG. 1), the encoder encodes a frame at a lower resolution if it detects complex motion, fade and dissolve conditions, high quantization scale or high estimated decoding time. The prior-art system suffers from the following limitations: It switches to a reduced resolution mode if any one of the above-mentioned conditions occurs. Hence the presence of fast motion in the video stream would cause the encoder to switch to low resolution even if the decoder buffer level is high (for the case of constant bitrate video discussed above). Thus, considering the above conditions individually in selecting the resolution, this technique is not adequate because a function that embodies a combination of the above conditions is required. A second limitation of the above-mentioned system is that it does not address the problem of temporal distortion caused by switches in the encoding resolution. Since the system does not take the resolution mode history of previous frames into account, there is a significant possibility that the encoder may oscillate between different resolution modes.
An example of an encoding apparatus aimed at the design of a resolution selection controller is disclosed in U.S. Pat. No. 5,805,222. In this prior art system, the quantizer step size, amount of data coded and buffer occupancy of a previous frame are employed to select the resolution of the current frame being encoded. However, this system has the following limitations: The prior-art system uses statistical information from only one previous frame to make the resolution selection decision. However, it is known that accurate estimation of statistical information of a video bitstream requires incorporation of statistics over a plurality of frames. Estimating such information from just one previous frame is liable to be inaccurate since video frames typically exhibit diverse statistical behavior Further, many video effects such as gradual scene changes, which have important ramifications on the encoding resolution selected, can only be detected by studying the statistical behavior over several successive frames.
Further, the prior-art system embodied in U.S. Pat. No. 5,805,222 (as illustrated in FIG. 2) does not consider the amount of motion present, while selecting the encoding resolution. The encoding resolution selected should depend on the presence (or absence) of motion, since motion effectively masks the blurring distortion present in low resolution video. In the absence of motion, it is advisable to avoid coding at low resolution, since it causes visually perceptible distortion. The prior-art system uses the amount of coded data, in lieu of a motion estimate, in selecting the encoding resolution. However the amount of coded data is a poor estimate of motion. For example, a frame in a still scene may, nevertheless, have a large amount of coded data, if the immediately prior frame (with respect to which the current frame is predicitively encoded) was coded poorly. Thus the prior-art system may code low motion sequences at low resolution causing appreciable distortion.
When the statistical information of the current (and future) frames is not considered, the system is vulnerable to estimation errors. This occurs, for example, when the current frame marks a scene change. When the current and previous frames belong to different scenes, the statistical behavior of the previous frame is not a good indicator of the advisability of encoding the current frame in low resolution mode. Certain encoding algorithms employ a look-ahead estimation of the statistics of future frames, which may be used to circumvent the described problem. Further disclosed in the prior art is a function of a product of the amount of data being coded, wherein the quantization scale is used to switch from high resolution mode to low resolution mode as well as from low resolution mode to high resolution mode with different preset thresholds. However, the use of the same function for both modal resolution decisions is not adequate.
The switch from high resolution to low resolution mode should be done when the number of bits available for encoding the current and future frames is low. On the other hand, the switch from low resolution back to high resolution mode should be done only when there is certainty that this switch will not cause reversion to low resolution mode immediately in the future. Thus the objective functions used to make the decisions need to be significantly different. For example, additional parameters such as the scene-change history need to be considered when switching from low resolution to high resolution mode.
It is an object of the present invention to provide an improved method for dynamic resolution switching which uses an estimate of the motion to provide distortion masking and which avoids the problems of inaccurate statistical estimation and repeated switching of resolution modes. It is a further object of the present invention to provide an improved coding method, which determines encoding parameters after taking into account the resolution of the current and previous frames being encoded.