Various digital applications, such as digital video, involve the processing, storage, and transmission of relatively large amounts of digital data representing, e.g., one or more digital images.
In order to reduce the amount of digital data that must be stored and transmitted in conjunction with digital applications, various digital signal processing techniques have been developed. These techniques involve, e.g., the variable length encoding of data, quantization of data, transform coding of data, and the use of motion compensated prediction coding techniques.
One standard proposed for the coding of motion pictures, commonly referred to as the MPEG-2 standard, described in ISO/IEC 13818-2 (Nov. 9, 1994) Generic Coding of Moving Picture and Associated Audio Information: Video(hereinafter referred to as the "MPEG" reference), relies heavily on the use of DCT and motion compensated prediction coding techniques. An earlier version of MPEG, referred to as MPEG-1, also supports the use of the above discussed coding techniques. In this patent application, references to MPEG-2 compliant data streams and MPEG-2 compliant inverse quantization operations are intended to refer to data streams and inverse quantization operations that are implemented in accordance with the requirements set forth in the MPEG reference.
In MPEG-2 chrominance data is normally encoded at one-half the resolution of luminance data. For example, four luminance blocks may be used to represent the luminance information associated with a portion of an image while two chrominance blocks are used to represent the same image portion.
Another standard which also uses motion compensated prediction, referred to herein as the ATSC standard, is specifically intended for television applications. The ATSC standard is described in a document identified as ATSC A/53 titled "ATSC DIGITAL TELEVISION STANDARD" (1994).
In accordance with both the MPEG and ATSC standards images, e.g., frames, can be coded as intra-coded (I) frames, predictively coded (P) frames, or bi-directional coded (B) frames. I frames are encoded without the use of motion compensation. P frames are encoded using motion compensation and a reference to a single anchor frame. In the case of a P frame, the anchor frame is a preceding frame in the sequence of frames being decoded. B frames are encoded using a reference to two anchor frames, e.g., a preceding frame and a subsequent frame. Reference to the preceding frame is achieved using a forward motion vector while reference to the subsequent frame is achieved using a backward motion vector. In MPEG, I and P frames may be used as anchor frames for prediction purposes. B frames are not used as anchor frames.
A known full resolution video decoder 2, i.e., an MPEG-2 video decoder, is illustrated in FIG. 1. As illustrated, the known decoder 2 includes a memory 30, a syntax parser and variable length decoding (VLD) circuit 4, inverse quantization circuit 16, inverse discrete cosine transform circuit 18, summer 22, switch 24, pair of motion compensated prediction modules 6 and 7 and a select/average predictions circuit 28 coupled together as illustrated in FIG. 1. The memory 30 includes both a coded data buffer 32 which is used for storing encoded video data and a reference frame store 34 used for storing decoded frames which may be used, e.g., as anchor frames and/or output to a display device.
For commercial reasons, particularly in the case of consumer products, it is desirable that video decoders be inexpensive to implement. The complexity of a decoder, in terms of the number of elements required to implement a decoder, is a factor which affects implementation costs. For this reason, an important issue in the design and implementation of video decoders is the minimization of complexity in terms of the amount of hardware, e.g., logic gates, required to implement a decoder.
A number of methods for reducing the complexity and/or cost of decoders have been developed. Examples of known method of reducing the cost of decoders include, e.g., the use of a preparser, the use of downsampled frames as prediction references, and the efficient implementation of the IDCT operation with downsampling. Video decoders which perform downsampling are sometimes referred to as "downsampling video decoders". Downsampling video decoders are discussed in U.S. Pat. No. 5,635,985 which is hereby expressly incorporated by reference.
FIG. 2 illustrates a known downsampling video decoder 10. The decoder 10 includes a preparser 12, a syntax parser and variable length decoding (VLD) circuit 14, an inverse quantization circuit 16, an inverse discrete cosine transform (IDCT) circuit 18, a downsampler 20, summer 22, switch 24, memory 30, a pair of motion compensated prediction modules 25, 27 and a select/average predictions circuit 28. The motion compensated prediction modules 25, 27 may include a drift reduction filter 26. The memory 30 includes a coded data buffer 32 and a reference frame store 34. The various components of the decoder 10 are coupled together as illustrated in FIG. 2.
In the known decoder 10, the preparser 12 receives encoded video data and selectively discards portions of the received data prior to storage in the coded data buffer 32. The encoded data from the buffer 32 is supplied to the input of the syntax parser and VLD circuit 14. The circuit 14 provides motion data and other motion prediction information to the motion compensated prediction modules 25, 27. In addition, it parses and variable length decodes the received data. The modules 25, 27 each include a motion compensated prediction filter 26. A data output of the syntax parser and VLD circuit 14 is coupled to an input of the inverse quantization circuit 16.
The inverse quantization circuit 16 generates a series of DCT coefficients which are supplied to the IDCT circuit 18. From the received DCT coefficients, the IDCT circuit 18 generates a plurality of integer pixel values. In the case of intra-coded images, e.g., I frames, these values fully represent the image being decoded. In the case of inter-coded images, e.g., P and B frames, the output of the IDCT circuit 18 represents image (difference) data which is to be combined with additional image data to from a complete representation of the image or image portion being decoded. The additional image data, with which the output of the IDCT circuit is to be combined, is generated through the use of one or more received motion vectors and stored reference frames. The reference frames are obtained by the MCP modules 25, 27 from the reference frame store 34.
In order to reduce the amount of decoded video data that must be stored in the memory 30, the downsampler 20 is used. In the case of inter-coded data, the downsampled video data output by the downsampler 20 is stored, via switch 24, in the reference frame store 34.
Motion compensated prediction modules 25 and 27 receive motion vector data from the syntax parser and VLD circuit 14 and downsampled anchor frames from the reference frame store 34. Using these inputs, they perform motion compensated prediction operations.
In the case of uni-directional motion compensation, the output of one of modules 25, 27 is selected by the select/average prediction circuit 28 and supplied to the summer 22. In the case of bi-directional motion compensation the values output by the modules 26 and 27 are averaged by the average predictions circuit 28. The values generated by the circuit 28 are supplied to the input of the summer 22.
In the case of inter-coded video data, the summer 22 is used to combine the output of the downsampler 20, with the output of the select/average predictions circuit 28. The resulting data which represents a decoded inter-coded video frame is stored, via switch 24, in the reference frame store 34.
The decoder 10 outputs the decoded video frames stored in the reference frame store 34 to be displayed on a display device. Because of the downsampling and pre-parsing operations, the decoded video frames are of a lower resolution than the resolution at which the frames were originally encoded.
The known reduced resolution video decoder illustrated in FIG. 2 can be implemented at lower cost than the standard full resolution decoder illustrated in FIG. 1. However, the known decoder 10 has the disadvantage that the images are of reduced resolution due to the use of downsampling.
Not only is it important that modern video decoders be capable of being implemented relatively efficiently, but is also likely to be important that they be capable of decoding video data in such a manner that video sequences, e.g., movies, can be displayed at height to width ratios other than that at which they were encoded. Such capability is important because, e.g., it allows images encoded at height to width ratios found in movie screens or high definition television (HDTV) to be played back on screens with different, e.g., standard definition television (SDTV) height to width ratios.
Currently, two techniques are commonly used for displaying images which were filmed or encoded at one height to width ratio on a screen that supports a second height to width ratio. The first technique is what is commonly referred to as "letter boxing". In this technique, the entire image is displayed at the original height to width ratio with black borders being added to, e.g., the top and bottom of the screen. By adding borders in this manner the portion of the screen which is not being used to display the image is blacked out.
The second technique for displaying images at a height to width ratio which is different from the images' original height to width ratio is commonly known as pan-and-scan. Using this technique, each image's center of interest is identified and the portion of the image, corresponding to the available screen dimensions, immediately surrounding the center of interest is selected for display. The portions of the image which are cropped are frequently referred to as side panels because one such panel is normally located on each side of the center of interest. The center of interest may be the center of the image but in many cases, e.g., in action scenes, may be off center. The center of interest in a sequence of images can change in position from one image to the next.
Notably, the MPEG specification allows for, but does not mandate, the transmission of center of interest information as part of an encoded bitstream.
While the known methods of reducing the cost of video decoders, such as the use of downsampling, have helped lower the cost of such devices, there still remains a need for new and improved methods and apparatus for reducing the cost of video decoders.
It is desirable that at least some of the cost saving methods and apparatus be suitable for use with full resolution video decoders since, in some applications reductions in resolution of the type caused by downsampling are unacceptable. It is also desirable that at least some cost reduction methods and apparatus be capable of being used in video decoders which support various display modes including, e.g., pan-and-scan.