An audio signal may have a base layer and an enhancement layer, collectively referred to as dual-layer, wherein the base layer represents a limited-quality version of encoded audio content and the enhancement layer represents encoded additional information for enhancing the quality of the audio content. For example, a bit stream may be composed of a low-bit-rate layer, such as e.g. an mp3 (MPEG-1 Layer III) bit stream, plus an additional layer that extends the base quality to an enhanced quality. In principle also more than one additional layer may be used, from which the highest may even enable bit-exact representation of the original PCM (pulse-code modulated) samples.
Encoding of such dual-layer signals is usually performed by encoding a base layer, thereby omitting certain information on the input signal, and then at least partly reconstructing the encoded base layer to get a prediction signal. Further, a difference signal between the prediction signal and the full-quality input signal is determined and encoded. The encoded difference signal then serves as enhancement layer.
FIG. 1 shows the encoder of an embedded lossless audio codec. In the upper signal path, the input signal is used to encode the base layer bit stream. The base layer encoder can e.g. be compliant to mp3. The base-layer codec applies a filter bank 11 for time-frequency decomposition that is unequal to the MDCT filter bank 13 applied in the extension layer signal path. In the exemplary case of mp3, the base layer filter bank 11 is a hybrid filter bank, composed of a 32-band polyphase filter bank, followed by independent MDCT analysis blocks in each sub-band. In the second signal path, the input signal is fed into an Integer MDCT block 13 which implements a perfectly reversible MDCT decomposition of the signal. The integer-valued MDCT frequency bins are the basis for lossless encoding of the extension layer information.
Since the hybrid base layer filter bank 11 is different from the Integer MDCT filter bank 13 of the enhancement layer, a mapping operation is required for obtaining the prediction signal. For this purpose, the base layer frequency bins (in the domain of the hybrid filter bank 11) are restored 16 by partial decoding, and then mapped to the MDCT domain. The mapping 17 can be performed in an efficient way, as e.g. described in EP 2 064 700 A11. The mapped base layer information is then subtracted 14 from the integer-valued MDCT coefficients. The residual coefficients s14 are fed into an entropy encoder 15 in order to minimize the bit rate that is required to transmit the lossless extension layer. 1 PD060080
Decoding of such dual-layer signals usually uses a procedure as is shown in FIG. 2. In the upper signal path, the base layer information is partially decoded 21 in order to recover the frequency bin information. Synthesis filtering to the time domain is not performed at this point, since this would only be required for decoding a base layer signal. Then precisely the same operations are conducted as in the encoder, that is, the frequency bins of the base layer information are restored (decoded) 22, and a mapping 23 of the restored frequency bins to the MDCT domain is performed. In parallel, the lower signal path decodes the extension bit stream. The output s24 of the entropy decoder 24 is identical to the error residual s14 of the base layer in the MDCT domain, as computed by the encoder's subtraction block 14. The error residual s24 is added 25 to the coefficients s23 mapped from the base layer information, and the sum is fed into an inverse Integer MDCT block 26. The output signal of the inverse Integer MDCT is perfectly identical (bit-exact) to the original input signal that was fed into the encoder.
A similar example is given in FIG. 4 of “IntMDCT—A Link Between Perceptual and Lossless Audio Coding”, 2002, IEEE by R. Geiger, J. Herre, J. Koller and K.-H. Brandenburg.
Audio decoders are often implemented within small portable and battery driven devices. It is therefore generally desirable to perform the decoding of encoded audio signals in a manner that saves power. In decoder implementations that are based on processors, this is equivalent with reducing the number of processing cycles that the processor has to execute.