The present invention relates to methods and apparatus for reducing and/or eliminating drift in luminance and/or chrominance values that can occur in various known reduced resolution, e.g., downsampling, video decoders.
Various digital applications, such as digital video, involve the processing, storage, and transmission of relatively large amounts of digital data representing, e.g., one or more digital images. Each image normally comprises a large number of pixels. Each pixel is represented in digital form using one or more numerical values referred to herein as pixel values. A pixel value provides, e.g., luminance or chrominance information corresponding to a single pixel.
In order to reduce the amount of digital data that must be stored and transmitted in conjunction with digital applications, various digital coding techniques, e.g., transform encoding techniques, have been developed. Discrete cosine transform (DCT) encoding is a particularly common form of transform encoding. The data (pixel values) representing several pixels, e.g., an 8xc3x978 block of pixels, is frequently encoded using DCT coding to generate a series of AC and DC DCT coefficient values. The DCT coefficient values represent the 8xc3x978 block of pixels in encoded form.
DCT encoding is frequently used in combination with motion compensated prediction techniques which are used to further reduce the amount of data required to represent a series of digital images. Motion compensated prediction involves the coding of all or a portion of an image by referring to a portion of one or more other images, e.g., reference frames. Motion vectors are used when encoding images via reference to other frame(s). A motion vector identifies pixels of a reference frame to be used when making a motion compensated prediction to reconstruct an image. The pixels to be used are identified in a motion vector through the use of horizontal and vertical offsets which are interpreted relative to the location of a macroblock that is being decoded.
One standard proposed for the coding of motion pictures, commonly referred to as the MPEG-2 standard, described in ISO/IEC 13818-2 (1996) Generic Coding of Moving Picture and Associated Audio Information: Video (hereinafter referred to as the xe2x80x9cMPEG-2xe2x80x9d reference), relies heavily on the use of DCT and motion compensated prediction coding techniques. An earlier version of MPEG, referred to as MPEG-1 also supports the use of motion compensated prediction.
In accordance with MPEG-2 images, e.g., frames, can be coded as intra-coded (I) frames, predictively coded (P) frames, or bi-directional coded (B) frames. I frames are encoded without the use of motion compensation. P frames are encoded using motion compensation and a reference to a single anchor frame. The single anchor frame is a preceeding frame in the sequence of frames being decoded. B frames are encoded using a reference to two anchor frames, e.g., a preceding frame and a subsequent frame. Reference to the subsequent frame is achieved using a backward motion vector while reference to the preceding frame is achieved using a forward motion vector. In MPEG, I and P frames may be used as anchor frames for prediction purposes. B frames are not used as anchor frames.
MPEG-1 and MPEG-2 both support the specification of motion vector information, i.e., vertical and horizontal offsets, in half-pixel (half-pel) units. These standards specify that bilinear interpolation, which involves a division operation, is to be used when determining predicted pixel values when non-integer offsets are specified. These standards also specify that chrominance motion vector values are to be obtained by scaling transmitted luminance motion vector values.
As will be discussed in detail below, the MPEG standards specify that the result of the division operation performed as part of a prediction should be rounded to the nearest integer. The MPEG standards further specify that when the result of the division operation has a fractional part of one half, the result is to be rounded away from zero. Since the quantities involved in prediction calculations are non-negative, this results in rounding a fractional part of one half up to the next highest integer. The specified rounding up results in an intentional biasing of pixel values.
As a result of MPEG""s integer rounding procedure compliant motion compensated prediction modules normally generate integer pixel values as their output. In addition, pixel values generated by performing an inverse discrete cosine operation are normally output as integer values. This simplifies subsequent processing by eliminating the need to handle fractional values.
MPEG encoders are designed with the expectation that data generated by MPEG encoders will be decoded in accordance with the above discussed MPEG specified integer rounding operation being performed at decoding time. Because of the predictable nature of the rounding operation, MPEG encoders can encode data in such a manner that, when all the encoded data is decoded by a fully compliant MPEG decoder, the rounding that occurs over multiple sequential predictions will not cause unanticipated changes in brightness or color sometimes referred to as drift.
Various approaches have been taken to implement low cost video decoders capable of decoding and displaying digital video data. Many of these approaches involve one or more data reduction operations, e.g., downsampling, designed to reduce the amount of encoded video data that must be stored and processed by a video decoder. A video decoder which performs downsampling is referred to as a xe2x80x9cdownsamplingxe2x80x9d video decoder. Because such decoders produce reduced resolution images, they are also sometimes referred to as xe2x80x9creduced resolutionxe2x80x9d decoders. Downsampling video decoders are discussed in U.S. Pat. No. 5,635,985 which is hereby expressly incorporated by reference.
FIG. 1 illustrates a known downsampling video decoder 10. The decoder 10 includes preparser 12, a syntax parser and variable length decoding (VLD) circuit 14, an inverse quantization circuit 16, an inverse discrete cosine transform (IDCT) circuit 18, a downsampler 20, summer 22, switch 24, memory 30, a pair of motion compensated prediction modules 26, 27 and a select/average predictions circuit 28. The memory 30 includes a coded data buffer 32 and a reference frame store 34. The various components of the decoder 10 are coupled together as illustrated in FIG. 1.
The preparser 12 receives encoded video data and selectively discards portions of the received data prior to storage in the coded data buffer 32. The encoded data from the buffer 32 is supplied to the input of the syntax parser and VLD circuit 14. The circuit 14 provides motion data and other motion prediction information to the motion compensated prediction modules 26, 27. In addition, it parses and variable length decodes the received data. A data output of the syntax parser and VLD circuit 14 is coupled to an input of the inverse quantization circuit 16.
The inverse quantization circuit 16 generates a series of DCT coefficients which are supplied to the IDCT circuit 18. From the received DCT coefficients, the IDCT circuit 18 generates a plurality of integer pixel values. In the case of intra-coded images, e.g., I frames, these values fully represent the image being decoded. In the case of inter-coded images, e.g., P and B frames, the output of the IDCT circuit 18 represents image (difference) data which is to be combined with additional image data to form a complete representation of the image or image portion being decoded. The additional image data, with which the output of the IDCT circuit is to be combined, is generated through the use of one or more received motion vectors and stored reference frames. The reference frames are obtained by the MCP modules 26, 27 from the reference frame store 34.
In order to reduce the amount of decoded video data that must be stored in the memory 30, the downsampler 20 is used. In the case of inter-coded data, the downsampled video data output by the downsampler 20 is stored, via switch 24, in the reference frame store 34.
Motion compensated prediction modules 26 and 27 receive motion vector data from the syntax parser and VLD circuit 14 and downsampled anchor frames from the reference frame store 34. Using these inputs, they perform motion compensated prediction operations. The motion compensated prediction modules 26, 27 generate integer pixel values representing a portion of the image being decoded.
In the case of uni-directional motion compensation, the output of one of modules 26, 27 is selected by the select/average predictions circuit 28 and supplied to the summer 22. In the case of bi-directional motion compensation the values output by the modules 26 and 27 are averaged by the average predictions circuit 28. The circuit 28 rounds the result of the averaging process into integer values in accordance with the MPEG specified rounding procedure. The integer values generated by the circuit 28 are supplied to the input of the summer 22.
In the case of inter-coded video data, the summer 22 is used to combine the output of the downsampler 20, with the output of the select/average predictions circuit 28. The resulting data which represents a decoded inter-coded video frame is stored, via switch 24, in the reference frame store 34.
The decoder 10 outputs the decoded video frames stored in the reference frame store 34 to be displayed on a display device. Because of the downsampling operation, the decoded video frames are of a lower resolution than the resolution at which the frames were originally encoded.
As a result of the data reduction operations performed by the downsampling decoder 10, the anchor frame data used in performing motion compensated predictions is considerably different than that which would have been used if the encoded video data were decoded and stored at full resolution for anchor frame purposes. Also, in general, the distribution of fractional values in a downsampling decoder resulting from motion compensated prediction are different from those that would result in a full resolution decoder. Because of these two factors, the effect of rounding the result of division operations performed during motion compensated predictions will be different than that which was contemplated at encoding time. This can result in drift in terms of pixel luminance and chrominance values from their intended values. The problem of drift increases as subsequent predictions are made from a predicted frame.
Accordingly, there is a need for methods and apparatus for reducing the amount of luminance and chrominance drift introduced into video images as the result of using reduced resolution decoding techniques such as downsampling. It is desirable that any new method and apparatus for reducing drift be relatively easy and inexpensive to implement. It is also desirable that at least some of the methods and apparatus be easy to incorporate with existing reduced resolution decoder designs.
The present invention relates to methods and apparatus for reducing and/or eliminating drift in luminance and/or chrominance values that can occur in various known reduced resolution, e.g., downsampling, video decoders.
As discussed above, MPEG requires that pixel values generated through the use of motion compensated prediction be rounded up to the next largest integer value when a fractional part of 0.5 results. This may occur, e.g., when forward or backward predictions are made and rounded to generate integer pixel values. It may also occur when pixel values representing the result of forward or backward predictions are averaged and the result rounded to generate integer pixel values, e.g., in the case of bi-directionally coded frames.
The present invention simulates the upward biasing effect of the above discussed MPEG rounding technique in, e.g., downsampling video decoders.
In one embodiment, the biasing effect is simulated by generating luminance and chrominance DC DCT coefficient bias values from, e.g., motion vector offset data. The DC DCT bias values are then added to the DC DCT coefficients of the luminance and chrominance blocks, respectively, which correspond to the same image block to which the motion vector data used to generate the bias values correspond. In one such embodiment, an IDCT circuit outputs pixel values which include fractional components. These pixel values are combined with other pixel values, generated through the use of motion compensated predictions which may also include fractional components. The pixel values resulting from the combining operation are averaged to produce integer values using a non-biased rounding operation. The resulting integer pixel values represent a decoded image or portion thereof which may be, e.g., stored for use as a reference frame and/or displayed.
In another embodiment, pixel values are directly adjusted to simulate the biasing effect associated with MPEG compliant rounding. In such an embodiment, luminance and chrominance pixel biasing values are generated. The bias values are added to the luminance and chrominance pixel values, respectively, generated through the use of motion compensated prediction. The pixel values are then rounded to generate integer pixel values.
The methods and apparatus of the present invention can be used in a wide variety of applications including television receivers, video recorders, computers and a host of other devices which decode video data.
Numerous additional features and embodiments of the present invention are discussed below in the detailed description which follows.