The present invention is directed, in general, to video encoding systems and, more specifically, to an encoding system and a decoding system for streaming video data.
Real-time streaming of multimedia content over data networks, including the Internet, has become an increasingly common application in recent years. A wide range of interactive and non-interactive multimedia applications, such as news-on-demand, live network television viewing, video conferencing, among others, rely on end-to-end streaming video techniques. Unlike a xe2x80x9cdownloadedxe2x80x9d video file, which may be retrieved first in xe2x80x9cnon-realxe2x80x9d time and viewed or played back later in xe2x80x9crealxe2x80x9d time, streaming video applications require a video transmitter that encodes and transmits a video signal over a data network to a video receiver, which must decode and display the video signal in real time.
Scalable video coding is a desirable feature for many multimedia applications and services that are used in systems employing decoders with a wide range of processing power. Scalability allows processors with low computational power to decode only a subset of the scalable video stream. Another use of scalable video is in environments with a variable transmission bandwidth. In those environments, receivers with low-access bandwidth receive, and consequently decode, only a subset of the scalable video stream, where the amount of that subset is proportional to the available bandwidth.
Several video scalability approaches have been adopted by lead video compression standards such as MPEG-2 and MPEG-4. Temporal, spatial and quality (e.g., signal-noise ratio (SNR)) scalability types have been defined in these standards. All of these approaches consist of a base layer (BL) and an enhancement layer (EL). The base layer part of the scalable video stream represents, in general, the minimum amount of data needed for decoding that stream. The enhanced layer part of the stream represents additional information, and therefore enhances the video signal representation when decoded by the receiver.
For example, in a variable bandwidth system, such as the Internet, the base layer transmission rate may be established at the minimum guaranteed transmission rate of the variable bandwidth system. Hence, if a subscriber has a minimum guaranteed bandwidth of 256 kbps, the base layer rate may be established at 256 kbps also. If the actual available bandwidth is 384 kbps, the extra 128 kbps of bandwidth may be used by the enhancement layer to improve on the basic signal transmitted at the base layer rate.
For each type of video scalability, a certain scalability structure is identified. The scalability structure defines the relationship among the pictures of the base layer and the pictures of the enhanced layer. One class of scalability is fine-granular scalability. Images coded with this type of scalability can be decoded progressively. In other words, the decoder may decode and display the image with only a subset of the data used for coding that image. As more data is received, the quality of the decoded image is progressively enhanced until the complete information is received, decoded, and displayed.
The proposed MPEG-4 standard is directed to video streaming applications based on very low bit rate coding, such as video-phone, mobile multimedia/audio-visual communications, multimedia e-mail, remote sensing, interactive games, and the like. Within the MPEG-4 standard, fine-granular scalability (FGS) has been recognized as an essential technique for networked video distribution. FGS primarily targets applications where video is streamed over heterogeneous networks in real-time. It provides bandwidth adaptivity by encoding content once for a range of bit rates, and enabling the video transmission server to change the transmission rate dynamically without in-depth knowledge or parsing of the video bit stream.
An important priority within conventional FGS techniques is improving coding efficiency and visual quality of the intra-frame coded enhancement layer. This is necessary to justify the adoption of FGS techniques for the compression of the enhancement layer in place of non-scalable (e.g., single layer) or less granular (e.g., multi-level SNR scalability) coding methods.
Many video coding techniques have been proposed for the FGS compression of the enhancement layer, including wavelets, bit-plane DCT and matching pursuits. At the MPEG-4 meeting in Seoul, Korea in March 1999, the bit-plane DCT solution proposed by Optivision was selected as a reference. The bit-plane coding scheme adopted as reference for FGS includes the following steps at the encoder side:
1. residual computation in the DCT domain, by subtracting from each original DCT coefficient the reconstructed DCT coefficient after base-layer quantization and dequantization;
2. determining the maximum value of all of the absolute values of the residual signal in a video object plane (VOP) and the maximum number of bits n to represent this maximum value;
3. for each block within the VOP, representing each absolute value of the residual signal with n bits in the binary format and forming n bit-planes;
4. bit-plane encoding of the residual signal absolute values; and
5. sign encoding of the DCT coefficients which are quantized to zero in the base-layer.
These coding steps are reversed at the decoder side. It is important to note that the current implementation of the bit-plane coding of DCT coefficients depends on base-layer quantization information. The input signal to the enhancement layer is computed primarily as the difference between the original DCT coefficients of the motion compensated picture and those of the lower quantization cell boundaries used during base layer encoding (this is true when the base layer reconstructed DCT coefficient is non-zero; otherwise zero is used as the subtraction value). The enhancement layer signal, herein referred to as the xe2x80x9cresidualxe2x80x9d signal is then compressed bit plane by bit plane. Since the lower quantization cell boundary is used as the xe2x80x9creferencexe2x80x9d signal for computing the residual signal, the residual signal is always positive, except when the base layer DCT is quantized to zero. Thus, it not necessary to code the sign bit of the residual signal.
One major disadvantage of the existing methods of encoding and decoding streaming video is its complexity. A large amount of information, such as quantization parameters, must be transmitted between the base layer encoder and the enhancement layer encoder, and between the base layer decoder and the enhancement layer decoder. Furthermore, the coding and decoding of the residual signal in the enhancement layer is a conditional operation that depends on whether or not the base layer DCT is quantized to zero. This adds additional complexity to the coder/decoder (i.e., codec) used.
There is therefore a need in the art for improved encoders and encoding techniques for use in streaming video systems. In particular, there is a need for encoders and decoders that use a simpler method to code and decode the residual signal. More particularly, there is a need for encoding techniques that are not based on whether the base layer DCT is quantized to zero. There is a further need for decoding techniques that are not based on whether the base layer DCT is quantized to zero.
To address the above-discussed deficiencies of the prior art, it is a primary object of the present invention to provide a new technique for reducing the complexity of an enhancement layer compression scheme. The present invention proposes a technique for reducing the complexity of the bit-plane compression scheme of, for example, the residual DCT coefficients currently adopted as a reference within the MPEG-4 standard. However, it is important to realize that the proposed improvements are not limited to the DCT transform. Those skilled in the art will readily understand that the principles of the present invention may also be successfully applied to other transforms (e.g., wavelets) for the compression of the base and enhancement layer. However, in the descriptions that follow, DCT coefficients are employed for illustration purposes only.
Accordingly, in an advantageous embodiment of the present invention, there is provided a video encoder comprising base layer circuitry capable of receiving an input stream of video frames and generating therefrom compressed base layer video data suitable for transmission to a streaming video receiver. The base layer video data comprises a plurality of original transform coefficients (O) associated with the input stream of video frames and a plurality of reconstructed base layer transform coefficients (B) generated by quantizing and de-quantizing the plurality of original transform coefficients. The video encoder further comprises enhancement layer circuitry capable of receiving the plurality of original transform coefficients (O) and the plurality of reconstructed base layer transform coefficients (B) and generating therefrom a residual signal (R). The residual signal (R) is proportional to a difference between the plurality of original transform coefficients (O) and the plurality of reconstructed base layer transform coefficients (B). The enhancement layer circuitry encodes and sends a sign of the residual signal (R) to the streaming video receiver.
In one embodiment of the present invention, the base layer circuitry comprises a transform circuit capable of generating the plurality of original transform coefficients (O).
In another embodiment of the present invention, the transform circuit is a discrete cosine transform (DCT) circuit.
In still another embodiment of the present invention, the base layer circuitry comprises a quantization circuit and an inverse quantization circuit capable of generating from the plurality of original transform coefficients (O) the plurality of reconstructed base layer transform coefficients (B).
In yet another embodiment of the present invention, the enhancement layer circuitry comprises a residual computation circuit capable of comparing the plurality of original transform coefficients (O) and the plurality of reconstructed base layer transform coefficients (B).
The present invention also may be embodied in a decoder. According to an advantageous embodiment of the present invention, there is provided a video decoder comprising base layer circuitry capable of receiving compressed base layer video data and determining therefrom a plurality of reconstructed base layer transform coefficients (B) generated by quantizing and de-quantizing the base layer video data. The video decoder further comprises enhancement layer circuitry capable of receiving enhancement layer video data associated with the compressed base layer video data and determining therefrom a residual signal (R) and a sign associated with the residual signal (R). The enhancement layer circuitry is further capable of reconstructing a plurality of enhancement layer transform coefficients (E) from the residual signal (R) and the plurality of reconstructed base layer transform coefficients (B).
In one embodiment of the present invention, the enhancement layer circuitry comprises an inverse transform circuit capable of generating from the plurality of reconstructed enhancement layer transform coefficients (E) a plurality of decompressed enhancement layer video frames.
In another embodiment of the present invention, the inverse transform circuit is an inverse discrete cosine transform (IDCT) circuit.
In still another embodiment of the present invention, the enhancement layer circuitry comprises a computation circuit capable of adding the residual signal (R) and the plurality of reconstructed base layer transform coefficients (B).
In yet another embodiment of the present invention, the enhancement layer circuitry comprises an enhancement layer decoding circuit capable of receiving the enhancement layer video data and determining therefrom the residual signal (R) and the sign associated with the residual signal (R).
The foregoing has outlined rather broadly the features and technical advantages of the present invention so that those skilled in the art may better understand THE DETAILED DESCRIPTION OF THE INVENTION that follows. Additional features and advantages of the invention will be described hereinafter that form the subject of the claims of the invention. Those skilled in the art should appreciate that they may readily use the conception and the specific embodiment disclosed as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the invention in its broadest form.
Before undertaking the DETAILED DESCRIPTION OF THE INVENTION, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document: the terms xe2x80x9cincludexe2x80x9d and xe2x80x9ccomprisexe2x80x9d and derivatives thereof, mean inclusion without limitation; the term xe2x80x9cor,xe2x80x9d is inclusive, meaning and/or; the phrases xe2x80x9cassociated withxe2x80x9d and xe2x80x9cassociated therewith,xe2x80x9d as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like; and the term xe2x80x9ccontroller,xe2x80x9d xe2x80x9cprocessor,xe2x80x9d or xe2x80x9capparatusxe2x80x9d means any device, system or part thereof that controls at least one operation, such a device may be implemented in hardware, firmware or software, or some combination of at least two of the same. It should be noted that the functionality associated with any particular controller may be centralized or distributed, whether locally or remotely. Definitions for certain words and phrases are provided throughout this patent document, those of ordinary skill in the art should understand that in many, if not most instances, such definitions apply to prior, as well as future uses of such defined words and phrases.