1. Field
The present invention relates to video technology and, more particularly, to video compression and/or decompression.
2. Background Information
Video information which is streamed over a network is typically composed of frames, which are rectangular arrays of pixels. With block-based streaming techniques, each frame is processed as a collection of pixel blocks, for example as 8xc3x978 or 16xc3x9716 pixel blocks. These blocks are typically processed using a Discrete Cosine Transform (DCT), a block-based process.
Streaming video over a network employs a large amount of bandwidth; to conserve bandwidth, frames of the video stream may be dropped before the stream is transmitted. With some video streaming technologies, the frames which are transmitted may include reference frames, which are the full array of pixels for the frame, and delta frames, which represent only the pixels which are different between a reference frame and a subsequent frame, or between subsequent delta frames. It is then the responsibility of the receiver of the video stream to interpolate any dropped frames between reference frames.
One component of frame interpolation is motion estimation. An interpolated frame is created by interpolating the motion of objects that move between reference frames. Motion estimation is accomplished by computing a motion vector between the starting and ending position of each block of pixels in adjacent reference frames. Motion estimation in interpolated frames is accomplished by translating each block along its associated motion vector in proportion to the position of the interpolated frame between the reference frames. For example, if there are two interpolated frames between reference frames, for the first interpolated frame, each block in the first reference frame is translated one-third of the distance along its motion vector. For the second interpolated frame, each block in the first reference frame is translated two-thirds of the distance along its motion vector.
One problem with block-based motion estimation techniques is that they may result in uneven motion flow which does not represent the true motion of the objects in the video stream. Block-based motion estimation techniques often do not work well at low frame rates because too many interpolated frames are typically computed between reference frames. Block-based motion estimation involves translating the frame blocks along vectors, and often a fractional block move (less than the width or height of a block) is employed for correct interpolation, resulting in uneven blending of the moved blocks with the reference frame.
It would be desirable to interpolate video frames in a video stream without employing block-based motion estimation and without increasing the computational resources employed to perform the interpolation. It would be desirable if the interpolated video frames were computed with improved quality over existing techniques. It would be further desirable to extend the range of frame rates at which quality interpolated frames may be generated.
Interpolated frames often contain artifacts, that is, incorrectly computed pixels which do not blend well with the reference frames. Without detection of frames with significant artifacts, even a few badly interpolated frames may substantially decrease the quality of the video stream. If badly interpolated frames are detected, error concealment measures may be applied, reducing the perceived loss of quality. With conventional interpolation techniques using motion estimation, error detection on action sequences (sequences with more motion) employ greater computational resources than placid (low motion content) sequences. It would be desirable to detect erroneous interpolated frames in both action and placid video sequences with approximately equivalent computational resources.
For a plurality of interpolated pixels in an interpolated video frame, an interpolated pixel of the plurality is classified as one of stationary, moving, covered, and uncovered. Components of the interpolated pixel are set to components of a previous pixel from a previous video frame, the previous pixel corresponding to the interpolated pixel in the video frame. If the interpolated pixel is uncovered, components of the interpolated pixel are set to components of a current pixel from a current video frame if the interpolated pixel is uncovered, the current pixel corresponding to the interpolated pixel in the video frame. If the interpolated pixel is moving, the interpolated pixel is set to a function of the current pixel and the previous pixel.
In another embodiment, the invention comprises classifying an interpolated pixel as stationary, moving, or uncovered. If the interpolated pixel is classified as moving, the interpolated pixel is set to a weighted sum of the corresponding pixel from a current video frame and the corresponding pixel from a previous video frame.