The present invention relates to methods and apparatus for decoding video data and, more particularly, to methods and apparatus for improving the quality of images generated by reduced resolution video decoders and to new and improved video decoders which produce reduced resolution images from encoded video data.
The use of digital signals to transmit video data is becoming ever more common. In order to efficiently transmit video information, data compression is often used.
The International Standards Organization has set a standard referred to as MPEG that is intended to be used for the encoding and transmission of digital video signals such as high definition television (HDTV). One version of the MPEG standard, MPEG-2, is described in the International Standards Organizationxe2x80x94Moving Picture Experts Group, Drafts of Recommendation H.262, ISO/IEC 1318-1 and 1318-2 of November 1994.
MPEG-2 and similar encoding systems rely on the use of discrete cosine transform coding and motion compensated prediction techniques, e.g., the use of motion vectors, to reduce the amount of digital data required to represent a series of digital images. Motion compensated prediction involves the use of video data transmitted to represent one frame, e.g., an anchor frame, to reconstruct subsequent, e.g., predicted frames, at decoding time.
Motion vectors are commonly used to implement motion compensated video coding. A motion vector can be used to represent a portion, e.g., a square or rectangular group of pixels, in a predicted frame. A motion vector includes information identifying a group of pixels in a reference frame to be used during video decoding in the generation of a current video frame. A motion vector includes vertical and horizontal offsets, interpreted relative to the position of a current macroblock of video data being decoded. The offsets included in a motion vector identify the pixel data in a reference frame to be used in generating the current frame. In MPEG 2, a macroblock corresponds to a group of 16xc3x9716 pixels. In the present application the term xe2x80x9cmacroblockxe2x80x9d is used in a manner that is consistent with its MPEG-2 meaning. The term xe2x80x9cblockxe2x80x9d or the phrase xe2x80x9cblock of pixelsxe2x80x9d is intended to refer to any group of pixels and is not intended to be limited to an MPEG-2 block which normally corresponds to a group of 4 pixels.
It is anticipated that at least some video images will be transmitted at resolutions far in excess of those commonly used today for NTSC television broadcasts. Television broadcasts at, e.g., resolutions of 1080xc3x971920 pixels, are often referred to as high definition television signals because they exceed the resolution of current NTSC television images. Television broadcasts involving the transmission of images having a resolution that is the same as or similar to present NTSC television signals are commonly referred to as standard definition television (SDTV) broadcasts.
The amount of data which must be stored and processed for HDTV signals can be considerably greater than that for SDTV signals. Because of the amount of memory and the processing power required to decode HDTV signals in real time, HDTV decoders can be considerably more expensive than SDTV decoders.
The use of reduced resolution video decoders has been suggested in order to allow HDTV video signals to be decoded and displayed using video decoders and display devices which are generally comparable in cost to SDTV decoders. Reduced resolution video decoders, also sometimes referred to as downsampling video decoders, reduce the amount of data used to represent video images thereby also reducing the amount of memory and processing power required to decode an HDTV signal. The decoding of an HDTV signal using a reduced resolution decoder results in the generation of, e.g., SDTV resolution images from an encoded HDTV signal.
Referring now to FIG. 1, there is illustrated a video decoder which is representative of various known reduced resolution video decoders. The reduced resolution video decoder includes an (optional) preparser 112, a syntax parser and variable length decoder circuit 120, an inverse quantization circuit 122, and an inverse DCT circuit 124. The output of the inverse DCT circuit 124 is coupled to the input of a downsampler 126. The downsampler 126 is used to reduce the resolution of the video images being processed and thus the amount of decoded video data which is stored in a video memory 114, e.g., for use as reference frames when generating subsequent frames encoded using motion vectors. In addition to the reference frame memory 114, the video decoder 100 includes a switch 129, summer 128, a pair of motion compensated prediction module 131 and a select/average predictions circuit 134. The motion compensated prediction modules 131 perform uni-directional predictions. In order to form bi-directional predictions, e.g., when processing B-frames, both motion compensated prediction modules 131 are used with the output of the two modules 131 being combined by the select/average predictions circuit. A single one of the prediction modules 131 is used when performing uni-directions predictions, e.g., when processing P-frames with the select/average predictions circuit 134 selecting the output of the appropriate module 131 to be used. The pixel data generated from reference frame data and output by the select/average predictions circuit 134 is combined by the summer 128 with received decoded video data to generate a complete representation of a video frame which was encoded using motion compensated prediction.
Because HDTV motion vectors are intended to be applied to full HDTV resolution video images and not downsampled video images such as those stored in the memory 114, the motion compensated prediction modules 131 must perform reduced resolution prediction prior to the data being combined by the summer 128, so that the data generated by the motion compensated prediction modules 131 will be of the same reduced resolution as the data output by the downsampler 126.
For a detailed discussion of various reduced resolution video decoders capable of decoding HDTV digital video data see U.S. Pat. No. 5,614,952 which is hereby expressly incorporated by reference.
While reduced resolution video decoders have significant advantages over HDTV decoders in terms of cost, the images generated by such decoders can suffer not only in terms of a reduction in resolution corresponding to the amount of downsampling performed on the HDTV image but also in terms of picture degradation resulting from the use of motion vectors. The use of motion vectors by reduced resolution video decoders offers the potential for serious image degradation in some instances resulting from prediction errors. Such prediction errors are due in large part to the application of motion vectors which were encoded to be used with full resolution reference frames being applied to reduced resolution, e.g., downsampled reference frames.
Thus, the use of known downsampling decoders can lead to certain annoying picture artifacts under particular combinations of scene content and motion vector conditions.
In order to produce decoded video images having a high degree of quality using a reduced resolution decoder, there is a need for methods and apparatus for identifying scene conditions and motion vectors which may result in significant and/or annoying prediction errors and thus degrade image quality. In addition, there is a need for methods and apparatus which can eliminate or minimize the degree of picture degradation resulting from the processing or use of motion vectors by reduced resolution video decoders.
The present invention relates to methods and apparatus for improving the quality of images generated by reduced resolution video decoders and to new and improved video decoders which produce reduced resolution images from encoded video data.
Various features and embodiments of the present invention are directed to identifying conditions within an image which may significantly degrade image quality if particular portions of the image are used by a reduced resolution decoder as reference data. For example, methods and apparatus of the present invention are directed to detecting constant image areas, e.g., blocks of black border pixels, which can produce long high contrast vertical or horizontal edges. Such edges can, in many instances, lead to significant prediction errors if a downsampling decoder uses data located at or near the edges in predictions.
In accordance with one embodiment of the present invention each individual reference frame is examined to detect constant block areas and the edges associated therewith. In such an embodiment, for each reference frame stored in a reduced resolution decoder""s frame memory, information relating to the stored reference frame""s constant block image areas and/or detected horizontal and/or vertical edges is also stored in memory.
The stored constant block area information is used to assess the risk of prediction errors when making predictions using the corresponding reference frame data. It is also used to identify edges which should be considered for prediction error reduction processing, e.g., in the form of filtering or extrapolation.
In another embodiment, constant block regions which repeatedly occur through a series of frames, e.g., such as black borders used for letterboxing, are detected. The detection process may involve examining only a portion of each frame which is decoded or, in one specific embodiment, the content of only intra-coded frames. Intra-coded frames may be examined for constant block regions since these frames generally require less processing to decode than inter-coded frames making otherwise idle processing resources available for this purpose. Information regarding the detected constant block regions which consistently occur in multiple, e.g., sequential, frames is stored in memory. The stored constant block region information is used for prediction error detection and various processing operations.
In addition to detecting scene conditions, e.g., constant block regions and/or horizontal and vertical edges, in frames used for reference purposes, the present invention is directed to assessing the risk that a large prediction error will occur and that such an error, if it does occur, will result in substantial picture degradation. In one particular embodiment the risk of prediction errors is analyzed on a macroblock by macroblock basis as macroblocks are reconstructed using predictions.
Assessing the risk that a significant prediction error will occur, involves, e.g., motion vector examination, prediction analysis, and optional reconstructed picture analysis.
Macroblocks which are assessed to be at high risk for significant prediction errors, e.g., errors which will cause a noticeable line across the length of a macroblock, are processed to minimize or eliminate the effect of the expected prediction error. Techniques for eliminating prediction errors include extrapolation and edge filtering among others.
One particular embodiment of the present invention is directed to a new and novel video decoder that decodes portions of frames at one, e.g., a reduced, resolution and other portions at increased, e.g., full, resolution along at least one of the two picture sampling axes. In one embodiment, image portions, e.g., constant block regions such as black borders, are identified. Portions of the image at or near a border, e.g., a horizontal or vertical edge, of a constant block region, e.g., within several pixel lines of a vertical or horizontal edge, are decoded and stored for reference purposes at increased resolution along the sampling axis that is perpendicular to the edge. The remaining portion of the image is decoded at a reduced resolution via the application of, e.g., downsampling. Once constant block regions are identified within a series of frames, pixels within the detected constant image regions which are not immediately adjacent the border of the region, in some embodiments, are downsampled more than other portions of the image being decoded. This technique of downsampling constant block areas more than other areas further reduces the amount of data used to represent the frame that is being decoded. This, in turn, helps offset the increased memory requirements associated with decoding some portions of the image at a higher resolution than other portions.
The decoder of the present invention which decodes some portions of a frame at a higher resolution than others can be described as a hybrid downsampling decoder.
Numerous additional features and advantages of the decoding methods and apparatus of the present invention are discussed in the detailed description which follows.