1. Field of the Invention
The present invention relates to image processing, and, in particular, to computer-implemented processes and apparatuses for encoding and/or decoding video signals for storage, transmission, and/or playback.
2. Description of the Related Art
Most known video codec (i.e., coder/decoder) architectures are designed to generate compressed video for real-time playback in a limited class of processing environments. If the video codec is designed for a playback system with relatively low processing capabilities (e.g., a low-end personal computer (PC) system), then decoding the compressed video on a playback system with greater processing capabilities (e.g., a high-end PC system) will not provide significant performance advantages. If, on the other hand, the video codec is designed for a high-end PC system, then the quality of the playback output must be degraded in order to decode the compressed video on a low-end PC system.
In many known video codecs, the only mechanism for degrading the video quality during playback is the dropping of frames. If the video codec includes interframe encoding, then, in order to allow for the dropping of frames, some of the frames must be encoded as disposable frames (i.e., those that may be dropped without affecting the decoding of subsequent frames). The inclusion of such disposable frames tends to increase the size of the compressed bitstream. In addition, dropping frames results in jerky and unnatural video motion which can be disturbing to the viewer.
It is desirable, therefore, to provide a video codec that provides playback of compressed video in a variety of processing environments in which frames are not dropped when playback is performed on low-end systems.
To address the problem of decoding encoded video bitstreams in environments with limited transmission bandwidth (e.g., in certain video server and video conferencing applications), video codecs have been designed to generate embedded bitstreams. An embedded video bitstream contains two or more sub-bitstreams. For example, an embedded video bitstream may be generated by applying a transform (e.g., a wavelet transform) to at least one of the component planes of each frame of an input video stream to transform the component plane into two or more bands of data. Each band of each frame is compressed and encoded into the bitstream. Each encoded band sequence forms a sub-bitstream of the embedded bitstream.
The embedded bitstream is said to be interleaved, because all of the encoded bands for each frame are grouped together in the bitstream. That is, if each frame is transformed into n different bands, then the n encoded bands for frame i are grouped together in the embedded bitstream before any of the encoded bands for frame i+1.
In order to play back an embedded video bitstream, all of the encoded bands or only a subset of the encoded bands for each frame needs to be transmitted to the decoder. Such an embedded video bitstream can be played back in environments with different transmission bandwidth. For example, a system with a relatively high transmission bandwidth may be able to play back all of the encoded bands for each frame during real-time playback, while a system with a relatively low transmission bandwidth may only be able to play back a subset of the encoded bands for each frame. Since the low-transmission bandwidth system is not playing back all of the encoded data for the video stream, the resulting video images are typically of lower quality compared to those played back on the high-transmission bandwidth system. However, the frame rate (i.e., the number of flames displayed per second) for the low-transmission bandwidth system will be the same as that for the high-transmission bandwidth system.
Thus, by using an embedded video bitstream, the compressed video may be played back on a low-transmission bandwidth system without affecting the frame rate. The resulting video images will typically be more coarse (i.e., lower quality), but the desired frame rate will be maintained. This capability to play back the same compressed video bitstream at the same frame rate on systems with different transmission bandwidths is called bitrate scalability.
Bitrate scalability has been used in the past.
One known video codec that generates an embedded bitstream for bitrate scalability is based on the wavelet transform. Those skilled in the art will understand that a wavelet transform is a type of transform that generates two or more bands (i.e., sets of data) when applied to a component plane of a video frame. Under this video codec, there is no interframe encoding. That is, each frame is a key frame that is encoded without reference to any other frame. Each band of each frame is encoded and embedded into the bitstream in an interleaved fashion. Bitrate scalability is achieved by dropping one or more of the encoded bands during playback processing. A disadvantage of this known video codec is that it does not support interframe encoding which typically decreases the size of the encoded bitstream.
Another known video codec that generates an embedded bitstream for bitrate scalability falls under the MPEG-II standard. Under this video codec, motion estimation and motion compensation are applied to the component planes and interframe differences are generated. A transform (e.g., the discrete cosine transform (DCT)) is then applied to each block of the interframe differences to generate transformed data (e.g., DCT coefficients).
To generate an embedded bitstream, the transformed data are divided into two parts, which are encoded and embedded into the bitstream in an interleaved fashion. In one embodiment, the first part of the transformed data corresponds to the most significant bits (MSBs) of each DCT coefficient of each block, while the second part corresponds to the least significant bits (LSBs) of the DCT coefficients. In another embodiment, the first part corresponds to the low-frequency DCT coefficients of each block, while the second part corresponds to the high-frequency DCT coefficients.
In either embodiment, the first part of the transformed data for each block is encoded for all of the blocks of the frame. The encoded first part forms the first portion of the embedded bitstream for that frame. The second portion of the embedded bitstream for the frame is generated by decoding the encoded first portion (e.g., using the inverse DCT transform). The resulting decoded signals are then subtracted from the original set of interframe differences to generate a second set of differences. This second set of differences is then encoded (e.g., by applying the DCT transform) to generate the second portion of the embedded bitstream.
Under this MPEG-II codec scheme, a system can achieve bitrate scalability by throwing away the second portion of the embedded bitstream during playback. To ensure that any system (high-transmission bandwidth or low-transmission bandwidth) can properly play back the compressed video bitstream, the encoder must use, as its reference for interframe differencing, a coarse image based only on the first portion of the embedded bitstream. As a result, a high-transmission bandwidth system must generate and maintain two decoded images for each frame: a coarse reference image based only on the first portion of the embedded bitstream and a fine display image based on the full embedded bitstream.
In addition to the disadvantage of having to maintain two decoded images, the encoding of the second potion typically results in a significant (about 30-40%) increase in bit rate. Under this MPEG-II scheme, a video codec that generated an embedded bitstream with more than two potions would typically have an even greater bit rate overhead.
While these systems provide some degree of bitrate scalability in a situation in which transmission bandwidth is limited, they provide negligible scalability in a situation in which decode processing bandwidth is limited. What is needed is a video codec architecture that provides playback sealability in terms of either transmission and/or processing without the disadvantages of the known systems.
It is therefore an object of the present invention to provide processes and apparatuses for encoding and/or decoding video signals to support video playback scalability without the disadvantages of the known systems.
In particular, it is an object of the present invention to provide a video codec that provides playback of compressed video in a variety of processing environments in which frames are not dropped when playback is performed on low-end systems.
Further objects and advantages of this invention will become apparent from the detailed description of a preferred embodiment which follows.