The present invention is related to that disclosed in United States patent application Ser. No. 09/347,882), entitled xe2x80x9cSYSTEM AND METHOD FOR FINE GRANULAR SCALABLE VIDEO WITH SELECTIVE QUALITY ENHANCEMENT,xe2x80x9d which is being filed concurrently herewith and is commonly assigned to the assignee of the present invention. The disclosure of the related patent application is incorporated herein by reference for all purposes as if fully set forth herein.
The present invention is directed, in general, to video encoding systems and, more specifically, to an encoding system for streaming video data.
Real-time streaming of multimedia content over data networks, including the Internet, has become an increasingly common application in recent years. A wide range of interactive and non-interactive multimedia applications, such as news-on-demand, live network television viewing, video conferencing, among others, rely on end-to-end streaming video techniques. Unlike a xe2x80x9cdownloadedxe2x80x9d video file, which may be retrieved first in xe2x80x9cnon-realxe2x80x9d time and viewed or played back later in xe2x80x9crealxe2x80x9d time, streaming video applications require a video transmitter that encodes and transmits a video signal over a data network to a video receiver, which must decode and display the video signal in real time.
Scalable video coding is a desirable feature for many multimedia applications and services that are used in systems employing decoders with a wide range of processing power. Scalability allows processors with low computational power to decode only a subset of the scalable video stream. Another use of scalable video is in environments with a variable transmission bandwidth. In those environments, receivers with low-access bandwidth receive, and consequently decode, only a subset of the scalable video stream, where the amount of that subset is proportional to the available bandwidth.
Several video scalability approaches have been adopted by lead video compression standards such as MPEG-2 and MPEG-4. Temporal, spatial, and quality (e.g., signal-noise ratio (SNR)) scalability types have been defined in these standards. All of these approaches consist of a base layer (BL) and an enhancement layer (EL). The base layer part of the scalable video stream represents, in general, the minimum amount of data needed for decoding that stream. The enhanced layer part of the stream represents additional information, and therefore enhances the video signal representation when decoded by the receiver.
For example, in a variable bandwidth system, such as the Internet, the base layer transmission rate may be established at the minimum guaranteed transmission rate of the variable bandwidth system. Hence, if a subscriber has a minimum guaranteed bandwidth of 256 kbps, the base layer rate may be established at 256 kbps also. If the actual available bandwidth is 384 kbps, the extra 128 kbps of bandwidth may be used by the enhancement layer to improve on the basic signal transmitted at the base layer rate.
For each type of video scalability, a certain scalability structure is identified. The scalability structure defines the relationship among the pictures of the base layer and the pictures of the enhanced layer. One class of scalability is fine-granular scalability. Images coded with this type of scalability can be decoded progressively. In other words, the decoder may decode and display the image with only a subset of the data used for coding that image. As more data is received, the quality of the decoded image is progressively enhanced until the complete information is received, decoded, and displayed.
The newly proposed MPEG-4 standard is directed to new video streaming applications based on very low bit rate coding, such as video-phone, mobile multimedia and audio-visual communications, multimedia e-mail, remote sensing, interactive games, and the like. Within the MPEG-4 standard, fine-granular scalability (FGS) has been recognized as an essential technique for networked video distribution. FGS primarily targets applications where video is streamed over heterogeneous networks in real-time. It provides bandwidth adaptivity by encoding content once for a range of bit rates, and enabling the video transmission server to change the transmission rate dynamically without in-depth knowledge or parsing of the video bit stream.
An important priority within conventional FGS techniques is improving coding efficiency and visual quality of the intra-frame coded enhancement layer. This is necessary to justify the adoption of FGS techniques for the compression of the enhancement layer in place of non-scalable (e.g., single layer) or less granular (e.g., multi-level SNR scalability) coding methods.
A limitation of the compression scheme currently adopted as reference for FGS resides in its inability to exploit the base layer coding information for improving the compression efficiency of the enhancement-layer. Another disadvantage of currently adopted FGS schemes resides in the fact that enhancement layer frames are coded independently of each other (i.e., xe2x80x9cintraxe2x80x9d coding of frames). The intra-frame coding of the enhancement layer is necessary for error resilience and for easy bit rate change at transmission time. However, because each enhancement frame is optimally coded in its own context, discontinuity or inconsistency between the image quality of consecutive frames is often introduced. The resulting FGS enhanced video may have xe2x80x9cflashingxe2x80x9d artifacts across frames. This is particular annoying and highly visible when compared to the more xe2x80x9cvisually stablexe2x80x9d single layer coded video.
There is therefore a need in the art for improved encoders and encoding techniques for use in streaming video systems. There is a further need for encoders and encoding techniques that are less susceptible to flashing artifacts and other sources of discontinuity in the quality of consecutive frames in a sequence of related frames. In particular there is a need in the art for encoders that selectively allocate the enhancement layer data in relation to the amount of activity or selected characteristics in the original video image.
To address the above-discussed deficiencies of the prior art, it is a primary object of the present invention to provide a new technique for improving the coding efficiency of the enhancement layer compression scheme. The proposed encoding technique uses one or more parameters taken from the base layer compression information (e.g., motion vectors, base layer quantization errors, rate-control info, etc.) to improve the image quality of the enhancement layer. Moreover, based on the observation that single layer encoding usually does a good job in optimizing video quality for particular bit rates, the present invention may use single layer coding at multiple bit rates as xe2x80x9cguidelinesxe2x80x9d for FGS encoding. The new compression techniques may be applied independent of the transforms chosen in the base and enhancement layers (e.g., discrete cosine transform (DCT) or wavelets). However, the use of certain base layer or single-layer information is less straightforward if different coding schemes are employed at the base and enhancement layers.
Accordingly, in an advantageous embodiment of the present invention, there is provided, for use in a video encoder comprising a base layer circuit capable of receiving an input stream of video frames and generating therefrom compressed base layer video frames suitable for transmission at a base layer bit rate to a streaming video receiver and an enhancement layer circuit capable of receiving the input stream of video frames and a decoded version of the compressed base layer video frames and generating therefrom enhancement layer video data associated with, and allocated to, corresponding ones of the compressed base layer video frames and suitable for transmission at a modifiable enhancement layer bit rate to the streaming video receiver an apparatus for controlling transmission of the enhancement layer video data. The apparatus comprises a base layer parameter monitor capable of receiving at least one base layer parameter and, in response thereto, modifying an allocation of the enhancement layer video data among the corresponding ones of the compressed base layer video frames.
In one embodiment of the present invention, the video encoder comprises a motion estimation circuit capable of receiving the input stream of video frames and determining therefrom a base layer motion parameter associated with at least one selected frame sequence in the input stream of video frames.
In another embodiment of the present invention, the base layer parameter monitor receives the base layer motion parameter and, in response thereto, modifies the allocation of the enhancement layer video data according to a level of motion in the at least one selected frame sequence indicated by the base layer motion parameter.
In still another embodiment of the present invention, the video encoder comprises a quantization circuit capable of receiving and quantizing transform data associated with the input stream of video frames to thereby reduce a size of the transform data and further capable of determining a base layer quantization error parameter associated with the quantized transform data.
In a further embodiment of the present invention, the base layer parameter monitor receives the base layer quantization error parameter and, in response thereto, modifies the allocation of the enhancement layer video data according to a quantization error indicated by the base layer quantization error parameter.
In a still further embodiment of the present invention, the video encoder comprises a base layer rate allocation circuit capable of determining the base layer bit rate, wherein the base layer bit rate is set at a pre-determined minimum rate at which the compressed base layer video frames are transmitted to the streaming video receiver, and generating therefrom a base layer bit rate parameter associated with the base layer bit rate.
In a yet further embodiment of the present invention, the base layer parameter monitor receives the base layer bit rate parameter and, in response thereto, modifies the allocation of the enhancement layer video data according to an estimated difference between the compressed base layer video frames and estimated compressed base layer video frames associated with a second base layer bit rate greater than the pre-determined minimum rate.
The foregoing has outlined rather broadly the features and technical advantages of the present invention so that those skilled in the art may better understand the detailed description of the invention that follows. Additional features and advantages of the invention will be described hereinafter that form the subject of the claims of the invention. Those skilled in the art should appreciate that they may readily use the conception and the specific embodiment disclosed as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the invention in its broadest form.
Before undertaking the Detailed Description, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document: the terms xe2x80x9cincludexe2x80x9d and xe2x80x9ccomprise,xe2x80x9d as well as derivatives thereof, mean inclusion without limitation; the term xe2x80x9cor,xe2x80x9d is inclusive, meaning and/or; the phrases xe2x80x9cassociated withxe2x80x9d and xe2x80x9cassociated therewith,xe2x80x9d as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like; and the term xe2x80x9ccontrollerxe2x80x9d means any device, system or part thereof that controls at least one operation, such a device may be implemented in hardware, firmware or software, or some combination of at least two of the same. It should be noted that the functionality associated with any particular controller may be centralized or distributed, whether locally or remotely. Definitions for certain words and phrases are provided throughout this patent document, those of ordinary skill in the art should understand that in many, if not most instances, such definitions apply to prior, as well as future uses of such defined words and phrases.