1. Field of the Invention
This invention pertains generally to data compression methods and systems, and more particularly to an efficient scalable predictive coding method and system where most or all of the information available to the enhancement-layer is exploited to improve the quality of the prediction.
2. Description of the Background Art
Many applications require data, such as video, to be simultaneously decodable at a variety of rates. Examples include applications involving broadcast over differing channels, multicast in a complex network where the channels/links dictate the feasible bit rate for each user, the co-existence of receivers of different complexity (and cost), and time-varying channels. An associated compression technique is xe2x80x9cscalablexe2x80x9d if it offers a variety of decoding rates using the same basic algorithm, and where the lower rate information streams are embedded in the higher rate bit-streams in a manner that minimizes redundancy.
A predictive coding system for encoding and decoding a signal without scalability is well-known in the literature of signal compression. (See for example: predictive vector quantization [6], and motion-compensated predictive transform coding of video [3]). In such predictive coding systems the encoder includes a decoder and memory so that what is actually encoded is the difference between the input signal and a predicted version of the reproduced signal, this difference signal being called the residual. The decoder contains a prediction loop whereby the current residual frame is decoded and then it is added to a prediction of the current frame obtained from the previous reproduced frame. In some cases, the predictor uses several prior frames to predict the current frame.
A major difficulty encountered in scalable predictive coding is how to take advantage of the additional information, available to the enhancement-layer decoder for improved prediction, without causing undesired conflicts with the information obtained from the base layer. FIG. 1 depicts a two-layer scalable coding system 10 where it is assumed that the original input signal (e.g., an audio or video signal) is segmented into frames that are sequentially encoded. Typical examples are video frames, and speech frames, but xe2x80x9cframexe2x80x9d here will also cover the degenerate case of a single sample as in differential pulse coded modulation (DPCM). The term xe2x80x9cframexe2x80x9d as used herein refers either to a group of contiguous samples of an original input signal or a set of parameters extracted from the original group of samples (such as a set of transform coefficients obtained by a discrete-cosine transform (DCT) operation on the original group of samples) and in each case the terminology xe2x80x9cframexe2x80x9d or xe2x80x9csignalxe2x80x9d will be used to refer to this entity that is representative of the original group of samples or is itself the original group of samples.
The input frame 12, x(n), is compressed by the base encoder (BE) 14 which produces the base bit-stream 16. The enhancement-layer encoder (EE) 18 has access to the input frame 12 and to any information produced by or available to BE 14. EE 18 uses this data to generate the enhancement-layer bit-stream 20. A base decoder (BD) 22 receives the base bit-stream 16 and produces a reconstruction 24, {circumflex over (x)}b(n), while the enhancement-layer decoder (ED) 26 has access to both bit-streams and produces an enhanced reconstruction 28, {circumflex over (x)}e(n). The reconstruction frames that are available at the decoder are used to predict or estimate the current frame. Note that ED 26 has access to both bit streams and hence it effectively has access to both the reconstruction frame at the base layer, {circumflex over (x)}b(n), and the previous reconstructed frame at the enhancement layer, {circumflex over (x)}e(nxe2x88x921), while BD 22 has only access to the previous reconstructed frame at the base layer, {circumflex over (x)}b(nxe2x88x921), which is stored in the memory within BD. In the case of a scalable coding system with multiple enhancement layers, an enhancement layer decoder may have access to the reconstruction frames from lower enhancement layers as well as from the base layer. The prediction loop (internal to the operation of BD as in any predictive coding system but not shown in the figure) in this configuration causes severe difficulties in the design of scalable coding. Accordingly, a number of approaches to scalable coding have been developed. These include,
(1) The standard approach: At the base layer, BE 14 compresses the residual rb(n)=x(n)xe2x88x92P[{circumflex over (x)}b(nxe2x88x921)], where P denotes the predictor (e.g., motion compensator in the case of video coding). Note that for notational simplicity we assume first-order prediction, but in general several previous frames may be used. BD 22 produces the reconstruction {circumflex over (x)}b(n)=P[{circumflex over (x)}b(nxe2x88x921)]+{circumflex over (r)}b(n), where {circumflex over (r)}b(n) is the compressed-reconstructed residual. At the enhancement-layer, EE 18 compresses the base layer""s reconstruction error re(1)=x(n)xe2x88x92{circumflex over (x)}b(n)=x(n)xe2x88x92P[{circumflex over (x)}b(nxe2x88x921)]xe2x88x92{circumflex over (r)}b(n). The enhancement-layer reconstruction is {circumflex over (x)}e(n)={circumflex over (x)}b(n)+{circumflex over (r)}e(1)(n)=P[{circumflex over (x)}b(nxe2x88x921)]+{circumflex over (r)}b(n)+{circumflex over (r)}e(1)(n). See, e.g., [1]. A deficiency of this approach is that no advantage is taken of the potentially superior prediction due to the availability of {circumflex over (x)}e(nxe2x88x921) at the ED 26.
(2) The separate coding approach: BE 14 compresses rb(n) as above, but EE 18 compresses the xe2x80x9cenhancement-onlyxe2x80x9d prediction error {circumflex over (r)}e(2)=x(n)xe2x88x92P[{circumflex over (x)}e(nxe2x88x921)], directly. The enhancement-layer reconstruction is {circumflex over (x)}e(n)=P[{circumflex over (x)}e(nxe2x88x921)]+{circumflex over (r)}e(2)(n). A deficiency of this approach is that, while the approach takes advantage of information available only to the enhancement-layer, it does not exploit the knowledge of {circumflex over (r)}b(n) which is also available at the enhancement-layer. The two layers are, in fact, separately encoded except for savings on overhead information which needs not be repeated (such as motion vectors in video coding) [2].
(3) Layer-specific prediction at the decoder approach: BD 22 reconstructs the frame as {circumflex over (x)}b(n)=P[{circumflex over (x)}b(nxe2x88x921)]+{circumflex over (r)}b(n), and ED 26 reconstructs as {circumflex over (x)}e(n)=P[{circumflex over (x)}e(nxe2x88x921)]+{circumflex over (r)}b(n)+{circumflex over (r)}e(n). However, the encoders BE 14 and EE 18 use the same prediction [3], and the options are:
(a) Both encoders use base-layer prediction P[{circumflex over (x)}b(nxe2x88x921)]. This results in drift of the enhancement-layer decoder. (The term xe2x80x9cdriftxe2x80x9d refers to a form of mismatch where the decoder uses a different prediction than the one assumed by the encoder. This mismatch tends to grow as the xe2x80x9ccorrectionsxe2x80x9d provided by the encoder are misguiding, hence, the decoder xe2x80x9cdrifts awayxe2x80x9d).
(b) Both encoders use enhancement-layer prediction P[{circumflex over (x)}e(nxe2x88x921)]. This results in drift of the base-layer decoder.
(4) Switch between approaches (1) and (2) on a per frame or per block basis [4], or per sample [5]. This approach has the deficiencies of either approach (1) or (2) as described above, at each time depending on the switching decision.
Therefore, a need exists for a scalable predictive coding system and method that exploits the information available to the enhancement layer to improve quality without causing undesired conflicts as outlined above. The present invention satisfies those needs, as well as others, and overcomes the deficiencies of previously developed predictive coding systems and methods.
The present invention addresses the prediction loop deficiencies in conventional scalable coding methods and systems in a way that achieves efficient scalability of predictive coding. The approach is generally applicable and may, in particular, be applied to standard video and audio compression. In the present invention, most or all of the information available at an enhancement-layer may be exploited to improve the quality of the prediction.
By way of example, and not of limitation, in the present invention the current frame is predicted at the enhancement-layer by processing and combining the reconstructed signal representing: (i) the current base-layer (or lower layers) frame; and (ii) the previous enhancement-layer frame. The combining rule takes into account the compressed prediction error of the base-layer, and the parameters used for its compression. The main difficulty overcome by this invention is in the apparent conflicts between these two sources of information and their impact as described in the Background of the Invention. This difficulty may explain why existing known methods exclusively use one of these information sources at any given time. These methods will be generally referred to here as switching techniques (which include as a special case the exclusive use of one of the information sources at all times). Additionally, the invention optionally includes a special enhancement-layer synchronization mode for the case where the communication rate for a given receiver is time varying (e.g., in mobile communications). This mode may be applied periodically to allow the receiver to upgrade to enhancement-layer performance even though it does not have prior enhancement-layer reconstructed frames.
An object of the invention is to achieve efficient scalability of predictive coding.
Another object of the invention is to provide a method and system for scalable predictive coding that is applicable to typical or standard video and audio compression.
Another object of the invention is to provide a scalable predictive coding method and system in which all or most of the information available at an enhancement-layer is exploited to improve the quality of the prediction.
Further objects and advantages of the invention will be brought out in the following portions of the specification, wherein the detailed description is for the purpose of fully disclosing preferred embodiments of the invention without placing limitations thereon.