Software video compression or video encoding is a computationally expensive task. In a raw video sequence having an excessively large number of bits, the encoding process and resulting data transmission would be too intolerably slow for most viewers if every single one of the bits is encoded. Therefore, various techniques are implemented for reducing the amount of bits to encode, reducing frame rates, reducing resolution, and other reduction, for purposes of decreasing the overall size of the compressed video. This reduction is sometimes known as “lossy compression,” where in a given sequence of video frames, some savings are achieved by predicting current frames from previous frames and removing some perceptually unimportant data from the video sequence. The amount of data that is removed depends on the bit budget constraints.
An illustrative example is the encoding of images having sharp edges or other fine detail, such as sharp edges on objects, surface textures, minute facial features of individuals, and the like. Sharp edges contain high frequency components, and require a large number of bits to encode. Thus, the presence of complex high frequency components in video with a limited bit budget for encoding can take heavy tolls on video quality. To match the bit rate constraints, one approach is to heavily quantize the residual spatial information after prediction and spatial information for non-predicted parts in a compressed video sequence, to reduce the number of bits required to represent video sample values. Quantization of high frequency coefficients also leads to undesirable blocking, ringing noise artifacts, and mosquito artifacts in the resulting images.
Furthermore, video frames that attempt to retain their images' sharp edges and fine texture information, regardless of the degree of quantization, will nevertheless have more bits per frame to encode as compared to other frames that do not have sharp edges. Another factor that adversely affects encoded video quality is excessive frame dropping due to lack of available bits. Frame dropping generally occurs with variable frame rate encoders that often drop frames when there are insufficient bits available to encode a video frame.
The lack of bits can be due to two reasons. First, the current frame is estimated to produce significantly more than rationed bits for that frame (as would occur if the frame had sharp edges) and therefore that frame is dropped, resulting in increased distance between predicted frames, which leads to poor prediction between frames and thus higher bit budget requirements. Second, previously encoded frames may have produced more than estimated bits and have thus led to undesired levels of video buffer verifier (VBV) buffer fullness. Since the VBV buffer operates according to a “leaky bucket” model that needs to remain full while at the same time balancing the amount of bit input and bit output, undesired levels of VBV buffer fullness will cause some incoming frames to be dropped (since all of their bits cannot be buffered) until the VBV buffer empties to where it can accommodate new incoming frames.
Because frame dropping results in “jerky” video (which is unappealing to viewers), there is often a maximum limit on the number of consecutive frames that can be dropped. To respect the maximum limit, higher quantization (Q) values (i.e., large quantization steps) are used to reduce the number of bits. However, using large Q values leads to abrupt changes in video quality and an unpleasant viewer experience.
One approach to reduce frame dropping and compression artifacts is to filter the video sequence to remove the high frequency components (e.g., video noise, sharp edges, small details, fine texture information, etc.), thereby avoiding the artifacts that are generated as a result of quantization and bit budget constraints (frame dropping). This helps distribute the available bits in encoding low frequency data at higher quality (lower quantization Q) and/or avoids excessive frame dropping due to lack of available bits.
However, low pass filtering leads to blurring of source images (since the high frequency components that represent the sharp edges are removed), and other frames that use this image for reference will also propagate the blurring. While blurred images are easier to encode, they are undesirable in some situations from a video quality point of view. Therefore, there is clearly a tradeoff between having a soft smooth image sequence (e.g., blurred images), versus a crisp image sequence having artifacts and possible higher frame dropping.