Digital video signals are typically compressed for transmission from a source to a destination. One common type of compression is “interframe” coding, such as is described in the International Telecommunications Union-Telecommunications (ITU-T) Recommendations H.261 and H.262, or the Recommendation H.263. Interframe coding exploits the spatial similarities of successive video frames by using previous coded and reconstructed video frames to predict the current video signal. By employing a differential pulse code modulation (DPCM) loop, only the difference between the prediction signal and the actual video signal amplitude (i.e. the “prediction error”) is coded and transmitted.
In interframe coding, the same prediction is formed at the transmitter and the receiver, and is updated frame-by-frame at both locations using the prediction error. If a transmission error causes a discrepancy to arise between the prediction signal at the transmitter and the prediction signal at the receiver, the error propagates temporally over several frames. Only when the affected region of the image is updated by an intraframe coded portion of the transmission (i.e. a frame coded without reference to a previous frame), will the error propagation be terminated. In practice, this error propagation may result in an annoying artifact which may be visible for several seconds in the decoded, reconstructed signal.
Shown in FIG. 1 is a schematic representation of a conventional hybrid interframe coder 10. Only the fundamental elements of the coder are shown in FIG. 1. However, this type of hybrid coder is known in the art, and the omitted elements are not germane to understanding its operation.
The coder of FIG. 1 receives an input video signal at summing node 12. The output of summing node 12 is a subtraction from a current frame of the input signal, of a motion-compensated version of a previous frame of the input signal (discussed in more detail hereinafter). The output of summing node 12 is received by discrete cosine transform block 14 (hereinafter DCT 14). The DCT 14 is a hardware, software, or hybrid hardware/software component that performs a discrete cosine transform on the data received from the summing node 12, in a manner well-known in the art. The result is the transform of the incoming video signal (one block of elements at a time) to a set of coefficients which are then input to quantizer 16. The quantizer 16 assigns one of a plurality of discrete values to each of the received coefficients, resulting in an amount of compression provided by the quantizer which depends on the number of quantization levels used by the quantizer (i.e. the “coarseness” of the quantization). Since the quantizer maps each coefficient to one of a finite number of quantization levels, there is an error introduced by the quantizer, the magnitude of which increases with a decreasing number of quantization levels.
In order to perform the desired interframe coding, the output of quantizer 16 is received by an inverse quantizer 17 and an inverse discrete cosine transform element (hereinafter “inverse DCT”) 18. Inverse quantizer 17 maps the quantizer index into a quantizer representative level. The inverse DCT 18 is a hardware, software, or hybrid hardware/software component that performs an inverse discrete cosine transform on the data received from inverse quantizer 17, in a manner well-known in the art. This inverse transform decodes the coded data to create a reconstruction of the prediction error. The error introduced into the signal by quantizer 16 reduces the quality of the image which is later decoded, the reduced quality being a side effect of the data compression achieved through quantization.
The decoded version of the video signal is output by summing node 19, and is used by the coder 10 to determine variations in the video signal from frame to frame for generating the interframe coded signal. However, in the coder of FIG. 1, the decoded signal from summing node 19 is first processed using some form of motion compensation means (hereinafter “motion compensator”) 20, which works together with motion estimator 21. Motion estimator 21 makes motion estimations based on the original input video signal, and passes the estimated motion vectors to both motion compensator 20 and entropy coder 23. These vectors are used by motion compensator 20 to build a prediction of the image by representing changes in groups of pixels using the obtained motion vectors. The motion compensator 20 may also include various filtering functions known in the art.
At summing node 12, a frame-by-frame difference is calculated, such that the output of summing node 12 is only pixel changes from one frame to the next. Thus, the data which is compressed by DCT 14 and quantizer 16 is only the interframe prediction error representing changes in the image from frame to frame. This compressed signal may then be transmitted over a network or other transmission media, or stored in its compressed form for later recall and decompression. Prior to transmission or storage, the interframe coded signal is also typically coded using entropy coder 22. The entropy coder provides still further compression of the video data by mapping the symbols output by the quantizer to variable length codes based on the probability of their occurrence. After entropy coding, the signal output from entropy coder 22 is transmitted along with the compressed motion vectors output from entropy coder 23.
In practice, if a compressed video signal such as the one output from the coder of FIG. 1 is transmitted over unreliable channels (e.g. the internet, local area networks without quality of service (QoS) guarantees, or mobile radio channels), it is particularly vulnerable to transmission errors. Certain transmission errors have the characteristic of lowering the possible maximum throughput (i.e. lowering the channel capacity or “bandwidth”) of the transmission medium for a relatively long period of time. Such situations might arise due to a high traffic volume on a store-and-forward network such as the internet, or due to an increasing distance between a transmitter and receiver of a mobile radio channel.
In order to maintain a real-time transmission of the video information in the presence of a reduced bandwidth, the transmitter must reduce the bit rate of the compressed video. Networks without QoS guarantees often provide messaging channels that allow the receiver or the network to request a lower transmission bit rate from the transmitter. For example, real-time protocol (RTP), designed by the Internet Engineering Task Force and now part of the ITU-T Draft International Standard H.225.0 “Media Stream Packetization and Synchronization on Non-Guaranteed Quality of Service LANs”, can be used to “throttle” the transmitter bit rate. For a point-to-point transmission with real-time coding, the video source coder can usually accommodate the request for a reduced bit rate by using a coarser quantization by reducing the spatial resolution of the frames of the video or by periodically dropping video frames altogether. However, if the video has been coded and stored previously, the bit rate is chosen in advance, making such a request difficult to satisfy.
To accommodate the desire for a variable bit rate in the transmission of stored video, a “scalable” video representation is used. The term “scalable” is used herein to refer to the ability of a particular bitstream to be decoded at different bit rates. With scalable video, a suitable part of the bitstream can be extracted and decoded to yield a reconstructed video sequence with a quality lower than what could be obtained by decoding a larger portion of the bitstream. Thus, scalable video supports “graceful degradation” of the picture quality with decreasing bit rate.
In a video-on-demand server, the same original motion video sequence can be coded and stored at a variety of bit rates. When a request for the sequence is made to the server, the appropriate bit rate would be selected, taking into account the current capacity of the network. A problem arises, however, if it becomes necessary to change the bit rate during the transmission. The server may switch from a first bitstream having a first bit rate to a second bitstream having a second bit rate due to a different coarseness of quantization or different spatial resolution. However, if the sequences are interframe coded, the switchover produces annoying artifacts due to the difference in the image quality of the two bitstreams. These can be avoided by the regular use of intraframe coded frames (generally referred to as “I-frames”), in which the entire image is coded, rather than just the differences from the previous frame. The Moving Picture Experts Group (MPEG) standard (i.e. ITU-T H.262) calls for the regular inclusion of I-frames, typically every few hundred milliseconds. However, the use of I-frames, requiring a significant amount of data, dramatically increases the overall bit rate. For example, an I-frame might require six times as much data as an interframe coded frame. In such a case, coding every fifth frame as an I-frame would double the bit rate.
U.S. Pat. No. 5,253,058, to Gharavi, discloses a scalable video architecture which uses a base layer and an enhancement layer (called a contribution layer) which must be encoded by a separate encoder. The method does not support different frame rates for the video at different quality levels but, rather, for different spatial resolutions. More importantly, in this method, the enhancement layer cannot be transmitted and decoded independently; it always requires the transmission and decompression of the base layer first. This makes bandwidth-adaptive serving a complicated task, leads to inefficient compression, and ultimately affects the performance of the whole system.
It is therefore an object of this invention to allow the coding of video sequences for storage and retrieval over networks without QoS guarantees, such that the bit rate provided by the server can be changed during the transmission of the sequence without resorting to the use of I-frames, but while minimizing artifacts produced by the different degrees of quantization used in coding different bitstreams at different bit rates.