Real-time streaming of multimedia content over data networks, including the Internet, has become an increasingly common application in recent years. A wide range of interactive and non-interactive multimedia applications, such as news-on-demand, live network television viewing, video conferencing, among others, rely on end-to-end streaming video techniques. Unlike a "downloaded" video file, which may be retrieved first in "non-real" time and viewed or played back later in "real" time, streaming video applications require a video transmitter that encodes and transmits a video signal over a data network to a video receiver, which must decode and display the video signal in real time.
Scalable video coding is a desirable feature for many multimedia applications and services that are used in systems employing decoders with a wide range of processing power. Scalability allows processors with low computational power to decode only a subset of the scalable video stream. Another use of scalable video is in environments with a variable transmission bandwidth. In those environments, receivers with low-access bandwidth receive, and consequently decode, only a subset of the scalable video stream, where the amount of that subset is proportional to the available bandwidth.
Several video scalability approaches have been adopted by lead video compression standards such as MPEG-2 and MPEG-4. Temporal, spatial, and quality (e.g., signal-noise ratio (SNR)) scalability types have been defined in these standards. All of these approaches consist of a base layer (BL) and an enhancement layer (EL). The BL part of the scalable video stream represents, in general, the minimum amount of data needed for decoding that stream. The EL part of the stream represents additional information, and therefore enhances the video signal representation when decoded by the receiver.
For each type of video scalability, a certain scalability structure is identified. The scalability structure defines the relationship among the pictures of the BL and the pictures of the EL. One class of scalability is fine-granular scalability (FGS). Images coded with this type of scalability can be decoded progressively. In other words, the decoder can start decoding and displaying the image without the need for receiving all of the data used for coding that image. As more data is received, the quality of the decoded image is progressively enhanced until the complete information is received, decoded, and displayed.
The newly proposed MPEG-4 standard is directed to new video streaming applications based on very low bit rate coding, such as video-phone, mobile multimedia and audio-visual communications, multimedia e-mail, remote sensing, interactive games, and the like. Within the MPEG-4 standard, fine-granular scalability has been recognized as an essential technique for networked video distribution. FGS primarily targets applications where video is streamed over heterogeneous networks in real-time. It provides bandwidth adaptivity by encoding content once for a range of bit rates, and enabling the video transmission server to change the transmission rate dynamically without in-depth knowledge or parsing of the video bit stream.
Bitplane compression of digital signals is a popular coding method for many multimedia applications. In particular, bitplane coding of audiovisual signals enables progressive and scalable transmission of these signals. Typically, an audio or a visual signal undergoes some type of a transform, such as Discrete Cosine Transform (DCT) or Discrete Wavelet Transform (DWT) that convert spatial pixel data to frequency domain transform coefficients prior to bitplane coding. Next, each bitplane is scanned and coded starting with the most significant bit (MSB) representation of the signal and ending with the least significant bit (LSB) representation. Thus, if the transform coefficients are represented by n bits, there are n corresponding bitplanes to be coded and transmitted.
Depending on a fidelity criterion (e.g., maximum allowable distortion) or a bitrate budget constraint, the coding of the signal may stop at, or even within, any particular bitplane. This approach provides the progressive feature of bitplane compression, especially when the coding is taking place in real-time (i.e., at the same time the signal is being transmitted). For signals coded off-line or prior to transmission, bitplane coding results in an embedded and scalable bitstream. This enables the sender to stop the transmission of the stream at (or within) any bitplane in response, for example, to network conditions such as available bandwidth.
Consequently, bitplane compression, in general, provides a very fine granular scalability (FGS) coding of the signal. Depending on the particular method used for coding the bitplanes, this granularity could be as fine as a single bit or as coarse as an entire bitplane. Therefore, if a signal is bitplane coded using n planes and a total number of b bits, the resulting compressed stream could include anywhere between n and b progressive representations of the original signal embedded in that stream.
One implementation of the proposed FGS structure for MPEG-4 uses the current MPEG-4 video coding standard as the base layer (BL) encoding scheme and encodes the enhancement layer (EL) as the difference between the DCT coefficients of the original picture and the base layer reconstructed DCT coefficients. The enhancement encoding scans through the difference (or residual) DCT coefficients bit-plane by bit-plane, and encodes a series of 1's and 0's as a refinement of the base layer DCT coefficients.
The limitation of this implementation is that the enhancement layer encoder scans each individual residual DCT bit-plane from MSB to LSB, block by block. Therefore, there is no control by the encoder on which part or blocks of the residual signal should be better enhanced or given higher priority in the encoding process. In other words, the enhancement layer encoder does not control the distribution of compression artifacts throughout the enhanced picture. This major drawback leaves virtually no room for optimizing the enhancement layer encoder, which is highly desirable in designing video coding algorithms.
Another problem with the current FGS implementation is that its scalability is limited by the number of bit-planes of the residual DCT coefficients. One entire bit-plane has to be encoded in order to span the whole image, with no spatial skipping of visually less important information. By coding portions of one bit-plane ahead of the other, fewer bits are needed to span the whole image, and therefore more enhancement layers can be generated by better scalability.
In conventional motion-compensation DCT-based video coding algorithms, regional selective coding of pictures is usually realized through adaptive quantization of image blocks. The quantization step size for each block can vary according to encoding decisions and the step sizes are sent in the resulting bitstream. A special case or extension of adaptive quantization, known as region of interest (ROI) coding, exists in which the image may be segmented (or classified) into sub-regions with different levels of interest to potential viewers. The sub-regions are then coded with different levels of quality accordingly.
Both adaptive quantization and ROI have been proposed in the past and are now used in various coding standards and algorithms. For example, all MPEG (MPEG-1, MPEG-2, and MPEG-4) video coding standards allow various degrees of adaptive quantization. ROI coding of video or still images realized through bit-plane shifting also exists. However, in the context of FGS, the enhancement layer is currently coded with no hierarchy of quality importance in the picture.
There is therefore a need in the art for improved encoders and encoding techniques for use in streaming video systems. In particular, there is a need for encoders and encoding techniques that take into consideration visual characteristics of an image when encoding the image. More particularly, there is a need for encoders and encoding techniques that selectively enhance parts or blocks of the residual signal that have been given a higher priority in the encoding process.