Video compression involves the removal of information in an input video stream that is indiscernible (or nearly so) to the viewer, in order to reduce the size of the video stream. Each event, such as a change in the image being displayed on a group of pixels, is then assigned a code. Commonly occurring events are assigned few bits and rare events will have codes with more bits. These steps are commonly called signal analysis, quantization and variable length encoding respectively. There are four methods for video compression, discrete cosine transform (DCT), vector quantization (VQ), fractal compression, and discrete wavelet transform (DWT). DCT is by far the most popular of the four.
One of the most common standards related to DCT video compression is the Moving Picture Expert Group (MPEG) standard. MPEG is actually a series of different standards designed with a specific application and bit rate in mind, although MPEG compression scales well with increased bit rates.
While MPEG has been used for years in computer displays, recently such compression schemes have been applied to other digital displays, such as high definition television (HDTV) sets.
Video compression commonly involves motion compensation. Motion compensation relies on the fact that, often, for many frames of a video, the only difference between one frame and another is the result of either the camera moving or an object in the frame moving. In reference to a video file, this means much of the information that represents one frame will be the same as the information used in the next frame, once corrected for the motion of the camera and/or objects.
Motion compensation takes advantage of this to provide a way to create video frames using a reference frame. Many of the frames in a video (the frames in between two reference frames) could be eliminated. The only information stored for the frames in between would be the information needed to transform the previous frame into the next frame.
Another reason to perform motion compensation in digital displays is to convert an analog or lower frame rate video source to a high frame rate digital signal. Most motion pictures, for example, run at a frame rate of roughly 30 frames per second. If the digital frame rate is higher than the input source frame rate, it is necessary to perform interpolation to arrive at values for blocks of pixels for frames occurring between actual frames of the input video source. By interpolating between the frames, the system is able to predict where an object would be located in such a hypothetical frame and then can generate such a frame for display between two actual vide source frames.
One way to reduce errors in interpolated frames is to use a 3 tap filter to arrive at motion compensated pixel values for each block of pixels in interpolated frames. Here, a forward motion vector is calculated for a fixed block of pixels (an object) in a previous frame (PREV). This is performed by searching for the same fixed block of pixels in a current frame (CURR) and then arriving at a motion vector indicating the amount of movement between the two frames. A backward motion vector is then calculated by taking a fixed block of pixels in the CURR frame and searching the PREV frames for a match.
A first pixel value for one pixel of the object in the interpolated frame can be derived by using the location for that pixel in the PREV frame compensated for by the forward motion vector. This is performed by first determining the amount of weighting that needs to be applied to the forward motion vector based on the timing of the interpolation. For example, if the interpolated frame is exactly midway between the PREV and CURR frames, then the forward motion vector can be weighted by ½, meaning that the object is assumed to have moved half the distance from the PREV frame to the interpolated frame as it appears to have moved from the PREV frame to the CURR frame. If, on the other hand, the interpolated frame is ⅓ of the way between the PREV and CURR frame, the forward motion vector may be weighted by ⅓. This weighted forward motion vector can then be applied to the pixel location in the PREV frame to obtain a location of that pixel in the interpolated frame.
A second pixel value for that pixel of the object in the interpolated frame can be derived by using the location for that pixel in the CURR frame compensated for by the backward motion vector. Again, a weighting is applied based on the temporal location of the interpolated frame between the CURR and PREV frames. For example, if the interpolated frame is ⅓ of the way between the PREV and CURR frame, the backward motion vector may be weighted by ⅔. This weighted backward motion vector can then be applied to the pixel location in the CURR frame to obtain a location of that pixel in the interpolated frame.
A third pixel value for that pixel of the object may be derived by simply performing temporal interpolation for the exact pixel involved (regardless of movement of objects). If, for example, the pixel changes from an orangish color to a purplish color from the PREV frame to the CURR frame, and the interpolated frame is exactly midway between the PREV frame and the CURR frame, the pixel value for the interpolated frame may be the color that is exactly mid way between the orangish color and the purplish color.
Absent a scene change, as described earlier, generally the only movement involved in a video involves either camera movement (e.g., pans or zooms) or object movement. Occlusion refers to the moving on an object with relation to a background. The object moves in front of the background, blocking certain portions of the background, hence the term “occlusion.”
FIG. 1 is a diagram illustrating an example of occlusion. Here, an object 100 is moving in one direction, while a background 102 is moving in another. The occluded regions are dependent on the speed of the movements of the object and background with relation to each other, and represent the areas where the movement either unveils or conceals an area of the background from the previous frame. An area where the object 100 is present in both the previous frame and the current frame is generally not called an occluded area, even though technically the background is covered by the object in this area as well.
Traditionally, motion compensation algorithms such as the one described above suffer from problems with respect to occlusions. Depending upon the speed of the object with respect to the speed of the background, various visual artifacts can be seen when occlusions occur. Generally speaking, the faster the object moves with respect to the speed of the background, the more visual artifacts there are.
Judder is one commonly known artifact relating to fast motion. Judder is a subtle stuttering effect similar to blurring. Judder problems with modern displays, however, have been becoming less and less prevalent as manufacturers move to screens with higher refresh rates. For example, judder may occur on an older display having a 60 Mhz refresh rates, but more recent displays utilize 120 Mhz or even 240 Mhz refresh rates, which dramatically reduce such judder.
Another less well known visual artifact, however, is known as the halo artifact. These artifacts are characterized by pixel errors in the occlusions (reveal and conceal) areas of the picture. The errors appear as a type of visible “mushiness” in the occlusion region.