1. Field of the Invention
The present invention generally relates to video encoding and decoding systems. More specifically, the present invention provides scalable decoding complexity based on optional application of post-processing stages to enhance decoding and display processes.
2. Background Art
FIG. 1 is a functional block diagram of a conventional video encoder 100. The conventional video encoder 100 includes a video source device 102, a transform unit 104, a quantizer 106 a variable length encoder 108 and a motion estimation/prediction unit 116. The conventional video encoder 100 may accept data of a video sequence from the video source device 102, which may be, for example, either a video capture device or a storage device. Typically, image data of the video sequence is organized into frames, with each frame containing an array of pixels. The pixel data may be separated into luminance and a pair of chrominance components (e.g., Y, Cr, and Cb). The pixel data may be grouped together into pixelblocks or macroblocks.
The transform unit 104 transforms blocks of pixel data from a source frame to blocks of coefficient data according to a predetermined transform. For example, the transform unit 110 may operate according to a Discrete Cosine Transform (DCT). Conventionally, DCT coefficients are described as being a two-dimensional array of coefficients. The most common implementation is to convert an 8 pixel by 8 pixel block of source data to an 8×8 array of DCT coefficients. Alternatively, the transform unit 104 may operate according to a wavelet transform such that that the transform unit 104 produces wavelet coefficients based on input pixel block data. The pixel data received from the source 102 can be adjusted by the motion estimation/prediction unit 116 prior to transformation by the transform unit 104.
The quantizer 106 truncates coefficients output by the transform unit 104 by dividing them by a quantization parameter (qp). This reduces the magnitude of the coefficients that are used for subsequent coding operations. Some low level coefficients are truncated to zero. The quantization parameter may vary among blocks of a frame and among different frames. Thus, information regarding the quantization parameter itself may be included among the coded data output by the conventional video encoder 100 so that, during decode operations, the quantization parameter may be reconstructed and the quantization operation may be inverted.
The output of the quantizer 106 is passed to the variable length encoder 108. The variable length encoder 108 encodes the quantized coefficients and produces an encoded video bitstream for transmission over a communication channel 110. The communication channel 110 can be a real-time delivery system such as a communication network (e.g., a wireless communication network) or a computer network (e.g., the Internet). Alternatively, the communication channel 110 can be a storage medium (e.g., an electrical, optical or magnetic storage device) that can be physically distributed. Overall, the topology, architecture and protocol governing operation of the communication channel 110 are immaterial to the present discussion unless specifically identified herein.
The conventional video encoder 100 may further include a decoder unit 112, a motion compensation unit 118, a loopfilter 120 and a frame memory unit 114. These components can be used to store a decoded version of the encoded bitstream transmitted over the communication channel 110. Specifically, the decoder unit 112 includes an inverse variable length encoder (variable length decoder), an inverse quantizer and an inverse transform unit. The decoder unit 112 decodes the encoded video data output by the conventional video encoder 100. The output of the decoder 112 is provided to the motion compensation unit 118. The motion compensation unit 118 operates as an inverse motion estimation and compensation unit to reconstruct each frame of the original video sequence. The output of the motion compensation unit 118 is provided to the loopfilter 120. The loopfilter can operate as a deblocking filter 120 that can filter decoded macroblocks or pixelblocks to reduce blocking artifacts that are caused by the block structures resulting from the encoding scheme. The output of the loopfilter 120 can then be stored in the frame memory unit 114.
The motion estimation/prediction unit 116 can use the data stored in the flame memory unit 114, as well as input video sequence data from the video source device 102, to select portions of a frame for encoding. The motion estimation/prediction unit 116 can reduce the amount of video sequence data that needs to be encoded by comparing previously encoded frames and motion prediction information with current frame data. For example, the motion estimation/prediction unit 116 can be used to ensure that only the differences between successive input video frames are passed to the video encoding chain (i.e., the transform unit 104, the quantizer 106 and the variable length encoder 108) for encoding,
FIG. 2 is a functional block diagram of a conventional video decoder 200. The conventional video decoder 200 includes a variable length decoder 202, a scaler unit 204, an inverse transform unit 206 and a loopfilter (e.g., deblocking filter) 214. The conventional video decoder 200 receives encoded video data from the communication channel 110. The conventional video decoder 200 operates in complement to the conventional video encoder 100 to reproduce the video data sequence encoded by the conventional video encoder 100.
The variable length decoder 202, the scaler unit 204 and the inverse transform unit 206 each perform inverse operations of the processes implemented by the variable length encoder 108f the quantizer 106 and the transform unit 104, respectively. The output of the inverse transform unit 206 can be provided to the loopfilter 214 for further processing. For example, the loopfilter 214 can operate as a deblocking filter to remove blocking artifacts that are caused by the block structures resulting from the encoding scheme. The output of the loopfilter 214 can be provided to a video sink device 208 which can be, for example, a video display device or a storage device.
The conventional video decoder 202 can also include a frame memory unit 210 and a motion estimation/prediction unit 212. The frame memory unit 210 can store a copy of the decoded video sequence output by the conventional video decoder 200. The motion estimation/prediction 212 unit can use data stored in the frame memory unit 210, as well as data from the variable length decoder 202, to modify data output by the inverse transform unit 206. Specifically, the motion estimation/prediction unit 212 can use data from previously decoded frames to adjust the decoding of a currently decoded frame.
The conventional video encoder 100 and the conventional video decoder 200 can be implemented in hardware, software or some combination thereof. For example, the conventional video encoder 100 and/or the conventional video decoder 200 can be implemented using a computer system. Further, the conventional video encoder 100 and the conventional video decoder 200 can implement a variety of video coding protocols such as, for example, any one of the Moving Picture Experts Group (MPEG) standards (e.g., MPEG-1, MPEG-2, or MPEG4) and/or the International Telecommunication Union (ITU) H.264 standard,
Video data sequences are typically encoded once by an encoder and then received and decoded by several different decoders. Decoders can vary widely in terms of capabilities, features and processing power. For example, some feature-rich decoding devices may include enhanced memory capacity and memory bandwidth as well as increased processing/computational power in comparison to feature-poor decoding devices. Therefore, it is a challenge to encode video bitsreams in a way that can be efficiently exploited by a variety of decoders having a broad range of available processing resources and varying operational constraints. If the video sequence is encoded for feature-poor systems, then the encoded bitstream may not provide the data needed by a feature-rich decoder to take advantage of its enhanced capabilities. If the video sequence is encoded for feature-rich systems, then the encoded bitstream may be too complex and require too much work to be properly or efficiently decoded by a feature-poor device.
Furthermore, many encoder-decoder systems include many embedded complex stages that must all be invoked to produce pixel-accurate results. Thus, if all stages are not invoked then the resulting decoded video sequence will include errors. This all-or-nothing dependency among the embedded stages can improve compression of the encoded video sequence but comes at the expense of providing scalability at the decoder.
Accordingly, what is needed is the provision of an encoded video bitstream that can be properly and efficiently processed by decoders having a broad range of diverse capabilities by providing scalable complexity at each decoder. In particular, adjusting the complexity of decoding processes should account for the capabilities of the decoder and the state of the decoder at the time of decoding such that the decoder can specifically tailor the decoding process in accordance with received data and control information. Further, scaling the complexity of the decoding process should account for the characteristics of a display that may be used to present the decoded video sequence.