Video compression standards such as H.261, H.263, MPEG-1 and MPEG-2 are based on a hybrid block motion compensated discrete cosine transform (DCT) encoding scheme. That is, temporal redundancy of a video sequence is removed by motion compensating the previous image frame and subtracting it from the present image frame, resulting in a temporal prediction error signal. The spatial redundancy of this prediction error signal is removed by the use of an invertible DCT, which converts the statistically dependent elements into independent coefficients, or equivalently compacts the energy into a small number of coefficients. The quantization of these coefficients results in a lossy compressed video sequence. Due to implementation considerations, both the motion compensation and the DCT are performed using blocks of data. Specifically, the motion compensation is implemented using a 16.times.16 block matching algorithm, while the DCT is applied to 8.times.8 blocks of the temporal error signal.
Although very high compression ratios are obtainable for video sequences using this technique, artifacts are introduced which severely degrade the visual quality of the decoded sequence. Therefore, some form of post-processing is required to remove these artifacts and improve the usefulness of the encoded video. The two types of artifacts typically encountered are motion and block boundary artifacts. These two types of artifacts are described below.
Motion artifacts (also called mosquito artifacts) are defined as temporally nonstationary impulses that appear around objects, which are moving within the decoded video sequence. These artifacts result from the coarse quantization of the prediction error signal. The majority of the energy contained in a prediction error signal is the result of the motion estimator's inability to distinguish between differently moving objects. For instance, in video-conferencing sequences the subject is generally against a stationary background. Since the motion estimator is a 16.times.16 block matcher, the boundary between the moving object and stationary background is not detected. This leads to a situation where either part of the background is assumed to be moving, or part of the moving object is assumed to be stationary. Coarsely quantizing these prediction errors results in impulsive artifacts that change over time and appear to swarm around the moving object.
Block boundary artifacts are defined as the introduction of artificial block boundaries into the decoded video sequence. These artifacts are due to the combination of dividing the prediction error signal into blocks, as well as quantization. Since each block is quantized separately, the errors are most visible at the block boundaries.
If left unaddressed, these artifacts greatly degrade a viewer's perceived quality of the encoded video sequence.
Users of compressed video demand that the decoder be also able to resize the video display window. This capability requires the decoder to perform an additional (with regards to removing coding artifacts) post-processing operation involving data interpolation or decimation. An important requirement is that this resize operation must be computationally efficient (i.e., real time) and produce a resized video sequence that is visually appealing. That is, artifacts should not be introduced into the video sequence as a result of the resizing operation. Furthermore, coding artifacts should not become more visible in the decoded sequence due to the resizing operation.
Prior art only addresses the problem of removing coding artifacts or noise. The problem of video resizing is not addressed. The prior art requires that two post-processing filters be used. The first post-processing filter is for artifact removal, while the second is required for window resizing. These filters would be implemented in series. That is, the output of the artifact removal filter is the input to the window resize filter.
Each of these approaches imposes substantial additional implementation complexity. For example, one approach taught by the prior art requires that an edge detector and additional spatial filter bank be implemented, while other approaches teach methods for reducing noise that require several additional operations such as a local variance calculation and activity calculation followed by a filter coefficient calculation and filtering. All of these methods specify implementations which require additional multipliers and summers. Implementing these methods is costly both in hardware, with additional silicon being required, and software, with additional instructions that must be executed.
In summary, the two post-processing operations; i.e., removing artifacts and resizing the decoded video are individually difficult to achieve under the constraint of real time operation. As a combined problem, the real time constraint is even more difficult to satisfy while also trying to keep the overall costs low.
The additional cost in implementing artifact removal filters result from:
A) the increase in the size of the IC (integrated circuit) die due to additional multipliers, adders and buffers required to implement an artifact removal filter; PA1 B) a lower yield and therefore higher manufacturing costs due to the larger die; PA1 C) the increased processing power (additional millions of instructions per second, MIPS) required to carry out the additional instructions that are necessary; and PA1 D) the additional bus-bandwidth required to move the pixel data between the artifact removal filtering unit and the frame buffer.
The consumer electronics market is a price-sensitive market that requires either a low cost video compression ASIC (application specific integrated circuit) or low cost embedded processor. The methods for artifact removal taught by prior art are too expensive to be included in such devices. Therefore, the prior art does not adequately address the needs of this and other price sensitive markets.