In recent times, the multi-channel audio reproduction technique is becoming more and more important. This may be due to the fact that audio compression/encoding techniques such as the well-known mp3 technique have made it possible to distribute audio records via the Internet or other transmission channels having a limited bandwidth. The mp3 coding technique has become so famous because of the fact that it allows distribution of all the records in a stereo format, i.e., a digital representation of the audio record including a first or left stereo channel and a second or right stereo channel.
Nevertheless, there are basic shortcomings of conventional two-channel sound systems. Therefore, the surround technique has been developed. A recommended multi-channel-surround representation includes, in addition to the two stereo channels L and R, an additional center channel C and two surround channels Ls, Rs. This reference sound format is also referred to as three/two-stereo, which means three front channels and two surround channels. Generally, five transmission channels are required. In a playback environment, at least five speakers at five decent places are needed to get an optimum sweet spot in a certain distance of the five well-placed loudspeakers.
Several techniques are known in the art for reducing the amount of data required for transmission of a multi-channel audio signal. Such techniques are called joint stereo techniques. To this end, reference is made to FIG. 9, which shows a joint stereo device 60. This device can be a device implementing e.g. intensity stereo (IS) or binaural cue coding (BCC). Such a device generally receives—as an input—at least two channels (CH1, CH2, . . . CHn), and outputs at least a single carrier channel and parametric data. The parametric data are defined such that, in a decoder, an approximation of an original channel (CH1, CH2, . . . CHn) can be calculated.
Normally, the carrier channel will include subband samples, spectral coefficients, time domain samples etc., which provide a comparatively fine representation of the underlying signal, while the parametric data do not include such samples of spectral coefficients but include control parameters for controlling a certain reconstruction algorithm such as weighting by multiplication, time shifting, frequency shifting, phase shifting, etc. The parametric data, therefore, include only a comparatively coarse representation of the signal or the associated channel. Stated in numbers, the amount of data required by a carrier channel will be in the range of 60-70 kbit/s, while the amount of data required by parametric side information for one channel will typically be in the range of 1.5-2.5 kbit/s. An example for parametric data are the well-known scale factors, intensity stereo information or binaural cue parameters as will be described below.
The BCC Technique is for example described in the AES convention paper 5574, “Binaural Cue Coding applied to Stereo and Multi-Channel Audio Compression”, C. Faller, F. Baumgarte, May 2002, Munich, in the IEEE WASPAA Paper “Efficient representation of spatial audio using perceptual parametrization”, October 2001, Mohonk, N.Y., in “Binaural cue coding applied to audio compression with flexible rendering”, C. Faller and F. Baumgarte, AES 113th Convention, Los Angeles, Preprint 5686, October 2002 and in “Binaural cue coding—Part II: Schemes and applications”, C. Faller and F. Baumgarte, IEEE Trans. on Speech and Audio Proc., volume level. 11, no. 6, November 2003.
In BCC encoding, a number of audio input channels are converted to a spectral representation using a DFT (Discrete Fourier Transform) based transform with overlapping windows. The resulting uniform spectrum is divided into non-overlapping partitions. Each partition approximately has a bandwidth proportional to the equivalent rectangular bandwidth (ERB). The BCC parameters are then estimated between two channels for each partition. These BCC parameters are normally given for each channel with respect to a reference channel and are furthermore quantized. The transmitted parameters are finally calculated in accordance with prescribed formulas (encoded), which may also depend on the specific partitions of the signal to be processed.
A number of BCC parameters do exist. The ICLD parameter, for example, describes the difference (ratio) of the energies contained in 2 compared channels. The ICC parameter (inter-channel coherence/correlation) describes the correlation between the two channels, which can be understood as the similarity of the waveforms of the two channels. The ICTD parameter (inter-channel time difference) describes a global time shift between the 2 channels whereas the IPD parameter (inter-channel phase difference) describes the same with respect to the phases of the signals.
One should be aware that, in a frame-wise processing of an audio signal, the BCC analysis is also performed frame-wise, i.e. time-varying, and also frequency-wise. This means that, for each spectral band, the BCC parameters are individually obtained. This further means that, in case a audio filter bank decomposes the input signal into for example 32 band pass signals, a BCC analysis block obtains a set of BCC parameters for each of the 32 bands.
A related technique, also known as parametric stereo, is described in J. Breebaart, S. van de Par, A. Kohlrausch, E. Schuijers, “High-Quality Parametric Spatial Audio Coding at Low Bitrates”, AES 116th Convention, Berlin, Preprint 6072, May 2004, and E. Schuijers, J. Breebaart, H. Purnhagen, J. Engdegard, “Low Complexity Parametric Stereo Coding”, AES 116th Convention, Berlin, Preprint 6073, May 2004.
Summarizing, recent approaches for parametric coding of multi-channel audio signals (“Spatial Audio Coding”, “Binaural Cue Coding” (BCC) etc.) represent a multi-channel audio signal by means of a downmix signal (could be monophonic or comprise several channels) and parametric side information (“spatial cues”) characterizing its perceived spatial sound stage. It is desirable to keep the rate of side information as low as possible in order to minimize overhead information and leave as much of the available transmission capacity for the coding of the downmix signals.
One way to keep the bit rate of the side information low is to losslessly encode the side information of a spatial audio scheme by applying, for example, entropy coding algorithms to the side information.
Lossless coding has been extensively applied in general audio coding in order to ensure an optimally compact representation for quantized spectral coefficients and other side information. Examples for appropriate encoding schemes and methods are given within the ISO/IEC standards MPEG1 part 3, MPEG2 part 7 and MPEG4 part 3.
These standards and, for example also the IEEE paper “Noiseless Coding of Quantized Spectral Coefficients in MPEG-2 Advanced Audio Coding”, S. R. Quackenbush, J. D. Johnston, IEEE WASPAA, Mohonk, N.Y., October 1997 describe state of the art techniques that include the following measures to losslessly encode quantized parameters:
Multi-dimensional Huffman Coding of quantized spectral coefficients
Using a common (multi-dimensional) Huffman Codebook for sets of coefficients
Coding the value either as a hole or coding sign information and magnitude information separately (i.e. have only Huffman codebook entries for a given absolute value which reduces the necessary codebook size, “signed” vs. “unsigned” codebooks)
Using alternative codebooks of different largest absolute values (LAVs), i.e. different maximum absolute values within the parameters to be encoded
Using alternative codebooks of different statistical distribution for each LAV
Transmitting the choice of Huffman codebook as side information to the decoder
Using “sections” to define the range of application of each selected Huffman codebook
Differential encoding of scalefactors over frequency and subsequent Huffman coding of the result
Another technique for the lossless encoding of coarsely quantized values into a single PCM code is proposed within the MPEG1 audio standard (called grouping within the standard and used for layer 2). This is explained in more detail within the standard ISO/IEC 11172-3:93.
The publication “Binaural cue coding—Part II: Schemes and applications”, C. Faller and F. Baumgarte, IEEE Trans. on Speech and Audio Proc., volume level. 11, no. 6, November 2003 gives some information on coding of BCC parameters. It is proposed, that quantized ICLD parameters are differentially encoded
over frequency and the result is subsequently Huffman encoded (with a one-dimensional Huffman code)
over time and the result is subsequently Huffman encoded (with a one-dimensional Huffman code),
and that finally, the more efficient variant is selected as the representation of an original audio signal.
As mentioned above, it has been proposed to optimize compression performance by applying differential coding over frequency and, alternatively, over time and select the more efficient variant. The selected variant is then signaled to a decoder via some side information.
The prior art techniques described above are useful to reduce the amount of data that, for example, has to be transmitted during an audio- or videostream. Using the described techniques of lossless encoding based on entropy-coding schemes generally results in bit streams with a non-constant bit rate. In the AAC (Advanced Audio Codec) standard, a proposal is made to reduce both, the size of the code words and the size of the underlying codebook, by using “unsigned” codebooks, assuming that the probability distribution of the information values to be encoded only depends on the magnitudes of the values to be encoded rather than their signs. The sign bits are then transmitted separately and can be considered as a postfix code, mapping back the coded magnitude information into the actual value (sign×magnitude). Assuming for example a four-dimensional Huffman codebook, this results in saving a factor of 2^4=16 (assuming that all values carry signs) in the size of the codebook.
Quite some efforts have already been made to reduce code size by entropy coding. Nonetheless, one still fights some major disadvantages using techniques of prior art. For example, when using multi-dimensional Huffman codebooks, one can achieve a decrease in the bit rate needed to transmit some encoded information. This is achieved at the cost of an increase in the size of the Huffman codebook that has to be used, since for each additional dimension, the Huffman codebook size increases by a factor of two. This is especially disadvantageous in applications where the Huffman codebook is transmitted together with the encoded information, as it is for example the case with some computer compression programs. Even if the Huffman codebook does not have to be transmitted with the data, it has to be stored in the encoder and in the decoder, needing expensive storage space, which is available only in limited quantities, especially in mobile applications for video or audio streaming or playback.