1. Field of the Invention
The present invention relates to time-frequency conversion algorithms and, in particular, to such algorithms in connection with audio compression concepts.
2. Description of the Related Art
A representation of real-valued discrete-time signals in the form of complex-valued spectral components is required for some applications when coding for the purpose of compressing data and, in particular, when audio-coding. A complex spectral coefficient can be represented by a first and second partial spectral coefficients, wherein, as is desired, the first partial spectral coefficient is the real part and the second partial spectral coefficient is the imaginary part. Alternatively, the complex spectral coefficient can also be represented by the magnitude as the first partial spectral coefficient and the phase as the second partial spectral coefficient.
In particular in audio-coding, real-valued transform methods are frequently employed, such as, for example, the well-known MDCT described in “Analysis/Synthesis Filter Bank Design Based on Time Domain Aliasing Cancellation”, J. Princen, A. Bradley, IEEE Trans. Acoust., Speech, and in Signal Processing 34, pp. 1153-1161, 1986. There is, for example, demand for a complex spectrum in a psycho-acoustic model. Here, reference is made to the psycho-acoustic model in Annex D.2.4 of the standard ISO/IEC 11172-3 which is also referred to as the MPEG1 standard. In certain applications, a complex discrete Fourier transform is performed in parallel to the actual MDCT transform (MDCT=modified discrete cosine transform) to calculate psycho-acoustic parameters, such as, for example, the psycho-acoustic masking threshold.
In this discrete Fourier transform (DFT), the input signal is at first divided into blocks of a predetermined length by means of a multiplication by temporally offset window functions. Each of these blocks is subsequently transformed into a spectral representation by applying the DFT. If the blocks used each contain L samples, i.e. if the window length is L, the output of the DFT in turn can be described completely in the form of L values altogether (real and imaginary parts of magnitude and phase values). If, for example, the input signal is real, the result will be L/2 complex values. With this usage of suitable window functions, the input signal can be reconstructed again from this representation using an inverse DFT.
This approach, however, is subject to some limitations. A critical sampling, for example, will only be possible if successive windows do not overlap. Otherwise, L values in the spectral representation would have to be transferred with a temporal offset of N<L values for N respective new input values of the DFT, which is particularly undesired in data compression methods.
The usage of non-overlapping window functions, however, means a severe limitation of the achievable spectral splitting quality, wherein especially the separation of different frequency bands is to be mentioned.
An improved band separation, however, can be achieved with real-valued transforms having overlapping window functions. A special class of these transforms are the so-called modulated filter banks including the possibility of an efficient implementation. Among these modulated filter banks, the modified discrete cosine transform (MDCT) has become predominant as a special form, where the window length L can take values between N and 2N−1 due to different degrees of overlapping.
FIG. 6 shows the separation of a discrete-time input signal x(n) into the spectral components uk,m, m representing the temporal block index, i.e. the time index after the sampling rate reduction, whereas k is the frequency index or sub-band index. The sampling frequencies are the same in all the sub-bands, i.e. the original sampling frequency is reduced by the factor N. The filter bank illustrated in FIG. 6 having filters 60 and downstream down-sampling elements 62 provides a uniform band separation.
In a modulated filter bank, the individual sub-band filters are formed by multiplying a prototype impulse response hp(n) by a sub-band-specific modulation function, wherein the following rule is used for the MDCT and similar transforms:
            h      k        ⁡          (      n      )        =                    h        p            ⁡              (        n        )              ⁢          cos      ⁡              (                              π            N                    ⁢                      (                          n              -                              N                2                            +                              1                2                                      )                    ⁢                      (                          k              +                              1                2                                      )                          )            
The above transform rule can also differ from the above equation, e.g. when the sine function instead of the cosine function is used or when “+N/2” is used instead of “−N/2”. Even the usage in an alternating MDCT/MDST, which will be explained hereinafter (when using k instead of k+1/2), is feasible.
In the above equation, hp(n) is the prototype impulse response. hk(n) is the filter impulse response for the filter associated to the sub-band k. n is the count index of the discrete-time input signal x(n), whereas N indicates the number of spectral coefficients.
The output value of a real-valued transform, such as, for example, the MDCT, which, as is well-known, is not energy-conserving, can only be employed for applications requiring complex-valued spectral components under certain circumstances. If, for example, the magnitudes of the real output values are used as an approximation for the magnitudes of complex-valued spectral components in the corresponding frequency domains, a result will be strong variations even with sine input signals having a constant amplitude. Such a procedure correspondingly provides bad approximations for short-term magnitude spectra of the input signal.
In the publication “A Scalable and Progressive Audio Codec”, Vinton and Atlas, IEEE ICASSP 2001, 7-11 May 2001, Salt Lake City, an audio coder having a transform algorithm including a base transform and a second transform is illustrated. The input signal is windowed by a Kaiser-Bessel window function to generate temporally successive blocks of sample values. The blocks of input values are then transformed either by means of a modified discrete cosine transform (MDCT) or by means of a modified discrete sine transform (MDST), depending on a shift index. This base transform process basically corresponds to the TDAC filter bank described in the cited publication by Princen and Bradley. Two temporally successive blocks of spectral coefficients are combined into a single complex transform such that the MDCT block represents the real parts of complex spectral coefficients, whereas the temporally successive MDST block represents the pertaining imaginary parts of the complex spectral coefficients. A time-frequency distribution of the magnitude of the complex spectrum is generated from this, wherein a two-dimensional magnitude distribution over time in each frequency band is windowed by means of window functions overlapping by 50%. Subsequently, a magnitude matrix is calculated by means of the second transform. The phase information is not subjected to the second transform.
The alternating usage of the output values of an MDCT as the real part and the imaginary part is also introduced as “MDFT” in the publication “MDCT Filter Banks with Perfect Reconstruction”, Karp and Fliege, Proc. IEEE ISCAS 1995, Seattle.
It has been found out that even this approximation of a complex spectrum from a real-valued spectral representation of the discrete-time input signal is problematic in that an adequate magnitude representation cannot be obtained for sounds of certain frequencies. Determining short-term magnitude spectra is thus only possible with this transform to a limited extent.