1. Field of the Invention
The present invention relates to compression algorithms for discrete values having audio and/or image information, and particularly to transformation algorithms, which are particularly to be used in encoders that are transformation-based, which means perform quantization/coding not of the original audio and/or image signals but comprise transformation into a spectral range prior to quantization/coding.
2. Description of the Related Art
Modern audio encoding methods, such as MPEG Layer3 (MP3) or MPEG AAC use transformations, such as the so-called modified discrete cosine transformation (MDCT) to obtain a block-wise frequency representation of an audio signal. Normally, such an audio encoder receives a stream of time-discrete audio samples. The stream of audio samples is windowed to obtain a windowed block of, for example, 1024 or 2048 windowed audio samples. For windowing, different window functions are used, such as a sine window, etc.
The windowed time-discrete audio samples are then converted into a spectral representation via a filter bank. In principle, a Fourier transformation, or for specific reasons a variation of the Fourier transformation, such as FFT or, as explained above, MDCT, can be used. The block of audio spectral values at the output of the filter bank can then be further processed, if required. In the above-mentioned audio encoders, quantization of the audio spectral values follows, the quantization levels being typically chosen such that the quantization noise introduced by quantization lies below the psycho-acoustic masking threshold, i.e. is “masked away”. The quantization is a lossy encoding. To obtain a further data amount reduction, the quantized spectral values are then entropy encoded, for example via Huffman encoding. By adding side information, such as scale factors, etc., a bit stream multiplexer forms a bit stream from the entropy encoded quantized spectral values, which can be stored or transmitted.
In the audio decoder, the bit stream is divided into encoded quantized spectral values and side information via a bit stream demultiplexer. The entropy encoded quantized spectral values are first entropy decoded to obtain the quantized spectral values. The quantized spectral values are then inversely quantized to obtain decoded spectral values, which have quantization noise, which lies below the psycho-acoustic masking threshold and will thus be inaudible. These spectral values will then be converted into a time representation via a synthesis filter bank to obtain time-discrete decoded audio samples. A transformation algorithm inverse to the transformation algorithm has to be used in the synthesis filter bank. Additionally, windowing has to be cancelled after the frequency time inverse transformation.
To obtain a good frequency selectivity, modern audio encoders typically use block overlapping. One such case is illustrated in FIG. 12a. First, for example, 2048 time-discrete audio samples are taken and windowed via a means 402. The window representing means 402 has a window length of 2N samples and provides a block of 2N windowed samples on the output side. In order to achieve window overlapping, a second block of 2N samples is formed via a means 404 which is illustrated in FIG. 12a, merely for clarity reasons, separately from the means 402. The 2048 samples fed into means 404 are, however, not the time-discrete audio samples immediately adjacent to the first window, but comprise the second half of the samples windowed by means 402 and comprise additionally merely 1024 “new” samples. The overlapping is symbolically illustrated by means 406 in FIG. 12a, which effects a degree of overlapping of 50%. Both the 2 N windowed samples output by means 402 and the 2N windowed samples output by means 404 are then subject to the MDCT algorithm via means 408 and 410, respectively. According to the known MDCT algorithm, means 408 provides N spectral values for the first window, while means 410 also provides N spectral values, but for the second window, wherein an overlapping of 50% exists between the first window and the second window.
As illustrated in FIG. 12b, in the decoder, the N spectral values of the first window are supplied to means 412, which performs an inverse modified discrete cosine transformation. The same applies for the N spectral values of the second window. These are supplied to means 414, which also performs an inverse modified discrete cosine transformation. Both means 412 and means 414 each provide 2N samples for the first window and 2N samples for the second window, respectively.
In means 416, which is designated by TDAC (TDAC=time domain aliasing cancellation) in FIG. 12b, the fact that the two windows are overlapping is considered.
Particularly, a sample y1 of the second half of the first window, which means with an index N+k, is summed with a sample y2 of the first half of the second window, which means with an index k, so that N decoded time samples result on the output side, which means in the decoder.
It should be noted that by the function of means 416, which is also referred to as an add function, the windowing performed in the encoder illustrated schematically in FIG. 12a is considered automatically, so that no explicit “inverse windowing” has to be performed in the decoder illustrated in FIG. 12b. 
When the window function implemented by means 402 or 404 is referred to as w(k), wherein the index k represents the time index, the condition has to be fulfilled that the window weight w(k) squared added to the window weight w(N+k) squared together results in 1, wherein k runs from 0 to N−1. If a sine window is used, the window weightings of which follow the first half wave of the sine function, this condition is always fulfilled, since the square of the sine and the square of the cosine for every angle together result in the value of 1.
It is a disadvantage of the windowing method with subsequent MDCT function described in FIG. 12a that windowing is achieved by multiplication of a time-discrete sample value, when considering a sine window, with a floating-point, since the sine of an angle between 0 and 180 degrees does not result in an integer, except the angle of 90 degrees. Even when integer time-discrete samples are windowed, floating-point numbers result after windowing.
Thus, even when no psycho-acoustic encoder is used, which means when a lossless encoding is to be obtained, quantization is required at the output of means 408 and 410, respectively, in order to be able to perform a reasonably manageable entropy encoding.
Generally, currently known integer transformations for lossless audio and/or video encoding are obtained by separating the transformations used there into Givens rotations and by applying the lifting scheme to every Givens rotation. Thereby, a rounding error is introduced in every step. For subsequent stages of Givens rotations, the rounding error keeps accumulating. The resulting approximation error becomes particularly problematic for lossless audio encoder approaches, particularly when long transformations are used, which provide, for example, 1,024 spectral values, such as it is, for example, the case in the known MDCT with overlap and add (MDCT=modified discrete cosine transformation). Particularly in the higher frequency range, where the audio signal typically has a very low amount of energy anyway, the approximation error can quickly become larger than the actual signal, so that this approach is problematic with regard to lossless encoding and particularly with regard to the encoding efficiency obtainable thereby.
With regard to audio encoding, integer transformations, which means transformation algorithms generating integer output values, are particularly based on the known DCT-IV, which considers no constant component, while integer transformations for image applications are rather based on the DCT-II, which particularly contains the provisions for the constant component. Such integer transformations are described, for example, in Y. Zeng, G. Bi and Z. Lin, “Integer sinusoidal transforms based on lifting factorization”, in Proc. ICASSP'01, May 2001, pp. 1,181-1,184, K. Komatsu and K. Sezaki, “Reversible Discrete Cosine Transform”, in Proc. ICASSP, 1998, Vol. 3, pp. 1,769-1,772, P. Hao and Q. Shi, “Matrix factorizations for reversible integer mapping”, IEEE Trans. Signal Processing, Signal Processing, Vol. 49, pp. 2,314-2,324, and J. Wang, J. Sun and S. Yu, “1-d and 2-d transforms from integers to integers”, in Proc. ICASSP'03, Hong Kong, April 2003.
As has been explained above, the integer transformations described there are based on the separation of the transformation into Givens rotations and on the application of the known lifting scheme to the Givens rotations, which involves the problem of accumulating rounding errors. This is particularly due to the fact that rounding has to be performed several times within one transformation, which means after every lifting step, so that particularly in long transformations, which involve correspondingly many lifting steps, rounding has to be performed particularly often. As has been explained, this results in an accumulated error and particularly in a relatively expensive processing, since rounding is performed after every lifting step to perform the next lifting step.