The present invention relates to audio signal processing and, in particular, to a device, a method and a computer program for arbitrary frequency shifts in the subband domain.
Computer-aided data processing systems are an integral part of everyday life in today's society which is characterized by new media. Systems for consuming new media have been present in nearly every household for quite some time. Examples of such systems which transmit and reproduce data in digital form are players for video and audio data, like, for example, for DVD and BluRay, CD and the mp3 file format. These reproduction systems are characterized by a nearly lossless reproduction of media contents. Apart from classical telecommunications, the Internet is an important portal for communications, exemplarily by means of VoIP. The underlying digital signal processing is common to all the technologies mentioned. It is of decisive importance for the quality of reproduction and efficiency of the digital technologies.
Audio signal processing is of increasing importance here. At present, a plurality of audio encoders are available on the market, which are realized, for example, by algorithms for digitally rendering audio material for storage or transmission. The goal of every encoding method is compressing the information contents of a signal such that it necessitates minimal storage space while at the same time maintaining the best reproduction quality possible. The efficiency of modern audio encoders is mainly dependent on the storage needed and, among other things, the calculating complexity needed for the algorithm.
Basically, a digital audio encoder is an instrument for transferring audio signals to a format suitable for storage or transmission. This takes place on the transmitter side of the audio encoder (encoder). The data produced in this way are then returned to the original form in the receiver (decoder) and, in the ideal case, correspond to the original data, except for a constant delay. The general goal of audio encoders is minimizing the amount of data necessitated for representing the audio signal while at the same time maximizing the reproduction quality perceived. When developing audio encoders, a numbers of factors must be kept in mind, like, for example, fidelity of reproduction, data rate and complexity. Apart from that, the delay added by processing the signal (the added delay) also has an important role (Bosi and Goldberg, 2003).
In particular in the beginning of audio encoding, the efficiency of the methods was of high importance since storage and computing performance were available only to a very limited extent. Nowadays, this demand seems to be of less importance. Even home PCs or laptops are able to calculate complicated algorithms easily in real time, and broad-band Internet links provide sufficient bandwidth for transmitting encoded audio material. Nevertheless, refining audio encoding methods is of particular importance. In the field of mobile communications and satellite transmission, the bandwidth is limited strongly. Reducing the amount of data to be transmitted is important. Additionally, in this field importance is attached to the efficiency of the encoding technology used. The underlying algorithms have to exhibit a simple structure in order to minimize the computing performance and current consumption.
Another aspect is the quality of the reproduced encoded audio signals. Many audio encoders reduce the amount of data using a reduction of irrelevance. Signal portions are lost here, depending on the data rate. With low data rates, the quality of the audio signals reproduced decreases.
Generally, two types of audio encoding can be differentiated between, namely lossless and lossy audio encoding. Lossless audio encoding allows precise reconstruction of the original signal on the receiver side. The lossy method in contrast causes irreversible deviations from the original signal via a model of subjective perception (Zölzer, 2005).
Lossless audio encoding is based on reducing the redundancy contained in the signal to be encoded. A common method here is, for example, linear predictive coding (LPC) in connections with subsequent entropy encoding. Such audio encoding methods allow the input signal to be reconstructed precisely bit by bit from the encoded bit stream.
Linear prediction uses statistical dependencies between successive samples of the signal in order to be able to predict future values. This is based on the fact that successive samples are more similar to one another than samples of a greater distance to one another. The prediction is realized by a linear prediction filter which estimates the current sample using a number of previous samples. However, it is not this estimation itself that is processed further, but the difference between this value and the actual sample at this place. The goal of linear prediction is minimizing the energy of this error signal by optimized filters and transmitting said error signal which necessitates only a small bandwidth (Weinzierl, 2008).
Subsequently, the error signal is entropy-encoded. Entropy is a measure of the mean information contents of a signal and indicates the theoretical minimum of the bits needed for encoding. A typical method here is Huffman encoding. Certain code words are associated here to individual samples, depending on their statistical probability of occurrence. Short symbols are associated to frequently occurring samples and rarely occurring signal values are represented by long code words. On average, the encoded signal is thus represented by the smallest number of bits possible (Bosi and Goldberg, 2003).
Both linear prediction and entropy encoding are reversible and thus do not remove any information from the signal. When combining the two methods, only redundancies are removed from the signal to be encoded. Since such lossless approaches are strongly dependent on the signal characteristic, the gain of encoding is comparably small. The compression rate achieved, i.e. the ratio of the input bit rate and the bit rate of the coded signal, is in a region between 1.5:1 and 3:1 (Weinzierl, 2008).
Lossy audio encoding is based on the principle of the reduction of irrelevance. This method necessitates a model of human perception which describes psycho-acoustic phenomena of the sense of hearing with regard to time and frequency resolution. Thus, lossy audio encoding is also referred to as encoding adapted to perception or psycho-acoustic encoding. In the field of audio encoding, all signal portions which cannot be perceived by humans and are thus inaudible are referred to as irrelevant (Zölzer, 2005). In order to understand the mode of functioning of an audio encoder adapted to perception more precisely, profound knowledge of psycho-acoustics is of great importance.
Human hearing analyzes a sound event by decomposing it into frequency groups. These frequency groups are represented in the Bark scale and in the English literature referred to as critical bands. Each of these frequency groups summarizes a frequency domain which is evaluated by the human hearing together. Thus, a frequency domain corresponds to a local area on the basilar membrane. All in all, 24 critical bands are associated to the basilar membrane, the bandwidth of which increases with an increasing frequency (Fastl and Zwicker, 2007). Lossy audio encoders also use this model of frequency groups in order to decompose broad-band signals into subbands and encode each band individually (Zölzer, 2005). This model is frequently adapted and frequently a linear frequency division of more than 24 bands is used instead of the Bark scale.
Another important characteristic of auditive perception is the frequency-dependent feeling of loudness of sounds of equal sound pressure levels. Two features of hearing result from this. On the one hand, sounds of different frequencies but an equal sound pressure level are perceived as being of different loudness, on the other hand there is a frequency-dependent threshold starting from which sounds can just still be perceived (Fastl and Zwicker, 2007). This threshold is also referred to as the absolute hearing threshold or hearing threshold in quiet and is illustrated in FIG. 22. Two conclusions may be drawn from this for audio encoding. Signals the levels of which are below the absolute hearing threshold need not be processed since they cannot be perceived anyway. Apart from that, the number of quantization steps necessitated per frequency band may also be determined from the distance between the hearing threshold in quiet and the signal level (Zölzer, 2005).
Covering or masking effects have the largest influence on audio encoding. Temporal and frequency-dependent masking may be differentiated between. In both cases, a masker here refers to a sound event by which another sound event is covered. Thus, the masked event is inaudible. With temporal masking, an event before or after the masker is covered. Pre-masking is independent of the durations of the masker and covers sound events up to 50 ms before perceiving the masker itself (Yost, 1994). Post-masking, in contrast, is dependent on the duration of the masker. The sound events here are covered after the masker has stopped. Depending on the duration of the masker, up to 200 ms may pass until the hearing is again responsive to signals in the range of the hearing threshold in quiet (Fastl and Zwicker, 2007).
FIG. 21 shows a schematic illustration of temporal masking. In particular, FIG. 21 schematically shows the regions of pre- and post-masking and the respective level below which signals are covered. Temporal masking may be used in audio encoding in order to conceal spurious noise caused by the encoding process, such as, for example, quantization noise, relative to high-level signal sequences (transients).
Masking effects in the frequency domain play a much more important role than temporal covering effects. Frequency-dependent masking describes the change in the hearing threshold in quiet for individual sounds and narrow-band noise. These signals distort the hearing threshold in quiet considerably due to their specific masked threshold of hearing. Signals the level of which is smaller than the masked threshold of hearing of the masker and which are located in the effective range of said threshold, cannot be perceived (Fastl and Zwicker, 2007). This context is illustrated in FIG. 22.
FIG. 22 shows a schematic illustration of the frequency-dependent masking in human hearing. As can be seen, the masked sound is below the masked threshold of hearing of the masker and is, thus, inaudible. This effect is made use of in lossy audio encoding methods. Signal portions below the frequency-dependent masked threshold of hearing are removed from the signal and are not processed further (Zölzer, 2005).
The general setup of a typical encoder adapted to perception is illustrated in FIG. 23. FIG. 23 shows a block circuit diagram of a psycho-acoustic audio encoder. At first, the PCM signal to be encoded is decomposed into frequency bands by the analysis filter bank and fed to the psycho-acoustic model. Here, a time-dependent masked threshold of hearing which regulates the precision of quantization for the different frequency bands is determined by the psycho-acoustic features of hearing described. Thus, important frequency bands, i.e. frequency bands easy to perceive, are quantized with a very high resolution and unimportant ones are represented at a resolution of a small number of bits. Subsequently, entropy encoding is performed for data reduction, as is also done in lossless audio encoding. Since additional control parameters have to be transmitted by the analysis filter bank and the psycho-acoustic model, the actual bit stream is set up by the bit stream multiplexer. The gain in encoding in lossy audio encoders here is obtained by combining quantization and entropy encoding (Zölzer, 2005). Depending on the quality to be achieved, the compression rate is between 4:1 and 50:1 (Weinzierl, 2008).
The decoder is of a comparably simple setup. At first, the bit stream received is divided again into signal data and control parameters by a demultiplexer. After that, entropy decoding and inverse quantization are performed. The control parameters here control the inverse quantization of the useful data. The subband signals obtained in this way are then fed to the synthesis filter bank for reconstructing the broad-band PCM signal (Zölzer, 2005). The respective block circuit diagram of a psycho-acoustic audio decoder is illustrated in FIG. 24.
A number of known signal transformations will be discussed below. Since quantization in many audio encoders is based on a perception model which describes the perception of humans in the frequency domain, the signal to be encoded has to be transferred to the frequency domain as well. There are a large number of transforms with different characteristics and fields of application for this. Transformations relevant for audio encoding will be presented below and the setup of a filter bank discussed.
Fourier transformation is the most important method for analyzing the harmonic structure of a signal. It is part of Fourier analysis and named after the French mathematician and physicist Jean-Baptiste-Joseph Fourier (1768 to 1830) who was the first to introduce it. The Fourier transform is a function for transferring a time signal to its representation in the frequency domain. It is used, among other things, to describe the performance of linear temporally invariant (LTI) systems and to be able to predict same (Burrus and Parks, 1985). Thus, it is, for example, of great importance in acoustics and in the characterization of human hearing. The basic procedure of the Fourier transform is decomposing a time signal into a weighted sum of sine and cosine oscillations. For aperiodic continuous signals, it is calculated as follows (Bosi and Goldberg, 2003):
                              X          ⁡                      (            f            )                          =                              ∫                          -              ∞                        ∞                    ⁢                                    x              ⁡                              (                t                )                                      ⁢                          ⅇ                                                -                  j                                ⁢                                                                  ⁢                2                ⁢                                                                  ⁢                π                ⁢                                                                  ⁢                ft                                      ⁢                                                  ⁢                          ⅆ              t                                                          (        2.1        )            
Here, x(t) is the signal to be analyzed in the time domain and X(f) the respective Fourier spectrum in the frequency domain. It must be kept in mind that the result is complex although a real signal is transformed. Using the Eulerean relation in equation 2.2, it can be shown that the real part of X(f) corresponds to the cosine terms of x(t) and that the imaginary part corresponds to the sine components. Using:e−j2πft=cos(2πft)−j sin(2πft)  (2.2)the result of equation 2.1 is:
                              X          ⁡                      (            f            )                          =                              ∫                          -              ∞                        ∞                    ⁢                                                    x                ⁡                                  (                  t                  )                                            ·                              (                                                      cos                    ⁡                                          (                                              2                        ⁢                                                                                                  ⁢                                                  π                          ⁢                          ft                                                                    )                                                        -                                      j                    ⁢                                                                                  ⁢                                          sin                      ⁡                                              (                                                  2                          ⁢                                                                                                          ⁢                                                      π                            ⁢                            ft                                                                          )                                                                                            )                                      ⁢                          ⅆ              t                                                          (        2.3        )                                =                                            ∫                              -                ∞                            ∞                        ⁢                                                            x                  ⁡                                      (                    t                    )                                                  ·                                  cos                  ⁡                                      (                                          2                      ⁢                                                                                          ⁢                      π                      ⁢                                                                                          ⁢                      ft                                        )                                                              ⁢                              ⅆ                t                                              -                      j            ⁢                                          ∫                                  -                  ∞                                ∞                            ⁢                                                                    x                    ⁡                                          (                      t                      )                                                        ·                                                                          ⁢                                      sin                    ⁡                                          (                                              2                        ⁢                                                                                                  ⁢                        π                        ⁢                                                                                                  ⁢                        ft                                            )                                                                      ⁢                                  ⅆ                  t                                                                                        (        2.4        )            resulting in:X(f)=Re{X(f)}+jIm{X(f)}  (2.5)
Since sine and cosine differ from each other only in their phase, the phase of the signal may be concluded from the ratio of the corresponding terms. The following applies:X(f)=|X(f)|·ejφ(f)  (2.6)and:
                                                    X            ⁡                          (              f              )                                                =                                                            (                                  Re                  ⁢                                                                          ⁢                                      {                                          X                      ⁡                                              (                        f                        )                                                              }                                                  )                            2                        +                                          (                                  Im                  ⁢                                                                          ⁢                                      {                                          X                      ⁡                                              (                        f                        )                                                              }                                                  )                            2                                                          (        2.7        )                                          φ          ⁡                      (            f            )                          =                  arctan          (                                    Im              ⁢                                                          ⁢                              {                                  X                  ⁡                                      (                    f                    )                                                  }                                                    Re              ⁢                                                          ⁢                              {                                  X                  ⁡                                      (                    f                    )                                                  }                                              )                                    (        2.8        )            
Thus, |X(f)| is referred to as absolute value frequency response and φ(f) is referred to as phase frequency response or simply as phase.
Due to the inverse Fourier transform (equation 2.9), the transformed signal is transferred again to its original representation in the time domain. It must be kept in mind that the Fourier transform and its inverse differ by a constant pre-factor and the sign of the exponential function (Burrus and Parks, 1985).
                              x          ⁡                      (            t            )                          =                              1                          2              ⁢                                                          ⁢              π                                ⁢                                    ∫                              -                ∞                            ∞                        ⁢                                          X                ⁡                                  (                  f                  )                                            ⁢                              ⅇ                                  j                  ⁢                                                                          ⁢                  2                  ⁢                                                                          ⁢                  π                  ⁢                                                                          ⁢                  ft                                            ⁢                                                          ⁢                              ⅆ                f                                                                        (        2.9        )            
The discrete Fourier transform will be discussed below in greater detail.
In practice, problems occur in digital computers when using the Fourier transform. On the one hand, this is due to the fact that only a finite number of time values can be processed, and on the other hand, the frequency variable also has to be sampled discretely, apart from the time variable. The solution of these problems is the discrete Fourier transform (DFT). Using the DFT, a finite, discrete-time signal is transferred to a discrete, periodic spectrum. This means that it is one of the most important transforms in digital signal processing. The origin of DFT is to be found in the Fourier transform, a precise derivation can be found in (Lochmann, 1990). The DFT of a discrete-time signal x[n] of the length N is defined as follows (Burrus and Parks, 1985):
                                          X            ⁡                          [              k              ]                                =                                    ∑                              n                =                0                                            N                -                1                                      ⁢                                                  ⁢                                          x                ⁡                                  [                  n                  ]                                            ⁢                              W                kn                                                    ,                                  ⁢                  ∀                      k            ∈                          [                              0                ,                                  N                  -                  1                                            ]                                                          (        2.10        )            
In analogy, the inverse discrete Fourier transform (IDFT) is:
                                          x            ⁡                          [              n              ]                                =                                    1              N                        ⁢                                          ∑                                  n                  =                  0                                                  N                  -                  1                                            ⁢                                                          ⁢                                                X                  ⁡                                      [                    k                    ]                                                  ⁢                                  W                                      -                    kn                                                                                      ,                                  ⁢                  ∀                      k            ∈                          [                              0                ,                                  N                  -                  1                                            ]                                                          (        2.11        )            
with the complex rotating phasor W.
                    W        =                  ⅇ                                    -              j                        ⁢                                          2                ⁢                                                                  ⁢                π                            N                                                          (        2.12        )            
Thus, X[k] is the discrete periodic spectrum of x[n] with ∀k, nε. The period length of the spectrum corresponds to the transform length N and normalized frequencies are mapped in the interval [0,2π].
For real input signals, the DFT has an important feature. Here, not N independent frequency coefficients are calculated here, as is the case in the general case, but only half of it. This feature may exemplarily be made use of for the storage or transmission of the data. For the re-transform, the second N/2 values are calculated using the following correlation (Rao and Yip, 2001):X[N−k]=X[k]*  (2.13)
The operator * in equation 2.13 characterizes the complex conjugation. Thus, X(k]* is the complex-conjugate sequence of values for X(k].
The calculating complexity of DFT and IDFT is N2 complex multiplications and additions. When symmetries are made use of when calculating, the number of calculating steps necessitated is reduced to N ld N and the complexity corresponds to ο(N log N). However, with fast methods, the transform length N has to correspond to a power of two. The fast Fourier transform is referred to as FFT (Kiencke and Jäkel, 2005).
The discrete Fourier transform has not gained acceptance in the field of data compression. The great disadvantages of DFT are the high calculating complexity and the redundancy contained in the spectrum. Although there are efficient methods for calculating the DFT, i.e. FFT, the result will be a complex spectrum. This means that N complex pairs of values are calculated from N transform values. In addition, only the first N/2 spectral values contain new information.
The discrete cosine and sine transforms will be discussed below.
The discrete cosine transform (DCT) is a solution for the problems of DFT mentioned before. The DCT is a real, discrete, linear and orthogonal transform. Due to these very features, it is the most frequently used transform in digital data compression (Britanak et al., 2007).
The DCT is a discrete trigonometric transform. All in all, eight DCT forms are differentiated between. Depending on their edge continuation, they are divided into even and odd transforms, and in types I, II, III and IV. However, for digital signal processing, only the even types of DCT are of importance. These are listed below (Rao and Yio, 2001):
                                                        X              I                        ⁡                          [              k              ]                                =                                    ɛ              ⁡                              [                k                ]                                      ⁢                                          ∑                                  n                  =                  0                                N                            ⁢                                                          ⁢                                                ɛ                  ⁡                                      [                    n                    ]                                                  ⁢                                  x                  ⁡                                      [                    n                    ]                                                  ⁢                                  cos                  (                                                            π                      ⁢                                                                                          ⁢                      nk                                        N                                    )                                                                    ,                                  ⁢                  ∀                      k            ∈                          [                              0                ,                N                            ]                                                          (                  2.14          ⁢          a                )                                                                    X              II                        ⁡                          [              k              ]                                =                                    ɛ              ⁡                              [                k                ]                                      ⁢                                          ∑                                  n                  =                  0                                                  N                  -                  1                                            ⁢                                                          ⁢                                                x                  ⁡                                      [                    n                    ]                                                  ⁢                                  cos                  (                                                            π                      ⁢                                                                                          ⁢                                              (                                                                              n                            +                            0                                                    ,                          5                                                )                                            ⁢                      k                                        N                                    )                                                                    ,                                  ⁢                  ∀                      k            ∈                          [                              0                ,                                  N                  -                  1                                            ]                                                          (                  2.14          ⁢          b                )                                                                    X              III                        ⁡                          [              k              ]                                =                                    ∑                              n                =                0                                            N                -                1                                      ⁢                                                  ⁢                                          ɛ                ⁡                                  [                  n                  ]                                            ⁢                              x                ⁡                                  [                  n                  ]                                            ⁢                              cos                (                                                      π                    ⁢                                                                                  ⁢                                          (                                                                        k                          +                          0                                                ,                        5                                            )                                        ⁢                    n                                    N                                )                                                    ,                                  ⁢                  ∀                      k            ∈                          [                              0                ,                                  N                  -                  1                                            ]                                                          (                  2.14          ⁢          c                )                                                                                    X                IV                            ⁡                              [                k                ]                                      =                                          ∑                                  n                  =                  0                                                  N                  -                  1                                            ⁢                                                          ⁢                                                x                  ⁡                                      [                    n                    ]                                                  ⁢                                  cos                  (                                                                                    π                        ⁡                                                  (                                                                                    n                              +                              0                                                        ,                            5                                                    )                                                                    ⁢                                                                                          ⁢                                              (                                                                              k                            +                            0                                                    ,                          5                                                )                                                              N                                    )                                                              ,                                          ⁢                      ∀                          k              ∈                              [                                  0                  ,                                      N                    -                    1                                                  ]                                                    ⁢                                  ⁢                  with          :                                    (                  2.14          ⁢          d                )                                          ɛ          ⁡                      [            p            ]                          =                  {                                                                                                                1                                              2                                                              ⁢                                                                                  ⁢                    if                    ⁢                                                                                  ⁢                    p                                    =                                                            0                      ⁢                                                                                          ⁢                      v                      ⁢                                                                                          ⁢                      p                                        =                    N                                                                                                                        1                  ⁢                                                                          ⁢                  else                                                                                        (        2.15        )            
Each of these forms has its special application in encoding. DCT-II is used primarily as a transform of image data. Literature sees it as the first type of DCT described. This means that generally the term “DCT” refers to DCT-II (Ahmed et al., 1974). Except for a pre-factor, DCT-III is the inverse transform of DCT-II and vice versa. For audio encoding, DCT-IV is of particular importance. It is the basis of the modified discrete cosine transform.
In order to be able to demonstrate important features of DCT, a correlation between DFT and DCT will be pointed out below. As has been illustrated before, the DFT calculates only N/2 independent frequency coefficients from a real-value signal of a length N. Conversely, this means that 2N values in the time domain are necessitated to obtain N spectral values. However, if only N time values are available, the signal has to be continued suitably. Symmetrical extension by reflecting/mirroring the entire signal seems to be suitable here. The extended signal thus seems to repeat itself at a period length of 2N. This is of advantage in that the spurious leakage effect of the DFT with clipped signals is suppressed (Kiencke and Jäkel, 2005).
Any real signal x[n] of a length N is extended symmetrically, the result being:{tilde over (x)}[n]=[x[0], . . . ,x[N−1],x[N−1], . . . ,x[0]]  (2.16)with 1≦n≦2N−1. The length of {tilde over (x)}[n] is thus 2N. The DFT from equation 2.10 with equation 2.12 is then applied to this signal and converted (Rao and Yip, 2001). A detailed derivation can be found in the annex A.1. The following applies:
                                          X            ¨                    ⁡                      [            k            ]                          =                              ∑                          n              =              0                                                      2                ⁢                                                                  ⁢                N                            -              1                                ⁢                                          ⁢                                                    x                ¨                            ⁡                              [                n                ]                                      ⁢                          ⅇ                                                -                  j                                ⁢                                                      2                    ⁢                                                                                  ⁢                    π                                                        2                    ⁢                                                                                  ⁢                    N                                                  ⁢                kn                                                                        (                  2.17          ⁢          a                )                                          =                      2            ⁢                                                  ⁢                          ⅇ                              j                ⁢                                  π                                      2                    ⁢                                                                                  ⁢                    N                                                  ⁢                k                                      ⁢                                          ∑                                  n                  =                  0                                                  N                  -                  1                                            ⁢                                                          ⁢                                                                    x                    ¨                                    ⁡                                      [                    n                    ]                                                  ⁢                                  cos                  (                                                            π                      ⁢                                                                                          ⁢                                              (                                                                              n                            +                            0                                                    ,                          5                                                )                                            ⁢                      k                                        N                                    )                                                                    ,                                  ⁢                  ∀                      k            ∈                          [                              0                ,                                  N                  -                  1                                            ]                                                          (                  2.17          ⁢          b                )            
When comparing this result to the DCT-II in equation 2.14b, one can see that these two equations only differ by the phase term
  2  ⁢            ⅇ              j        ⁢                  π                      2            ⁢            N                          ⁢        k              .  since same is signal-independent and does not contain any information, it can be neglected when calculating the DCT (Rao and Yip, 2001). For DCT-I, a similar correlation can be shown, but using a different signal continuation of x[n]. DCT-IV then results from a phase rotation of the basic function of DCT-II. A detailed derivation for this may be found in (Rao and Yip, 2001).
Some conclusions may be drawn from this result. At first, one will notice that the DCT, in contrast to DFT, is a purely real transform. Two advantages result from this. Firstly, no complex multiplications and additions have to be performed for the calculation and, secondly, only half of the storage space is necessitated for storing the data since there are no complex pairs of values. Furthermore, it is striking that the DCT, for calculating N independent frequency coefficients, necessitates exactly N values for the transform. The frequencies are all in the interval [0, π]. In contrast to DFT, the redundancy contained in the spectrum for real-value input signals has vanished and thus the frequency resolution is double as high. However, it is of disadvantage that the DCT spectrum cannot be transformed with regard to absolute value (or magnitude) and phase. Additionally, the situation may arise that frequencies which correspond to the DCT base functions (cf. equations 2.14a to 2.14d), but are rotated in relation thereto in their phase by 90° are contained in the signal. These frequencies are not represented by the DCT, i.e. the respective DCT coefficient is zero. For these reasons, the DCT is well suited for an effective and fast data compression, but less so for signal analysis (Malvar, 1992).
Apart from the discrete cosine transform, there is the discrete sine transform (DST). All in all, eight forms of DST are differentiated between. Only DST-IV is of importance here. With regard to its form and features, it corresponds to DCT-IV (Rao and Yip, 2001):
                                                        X              S              IV                        ⁡                          [              k              ]                                =                                    ∑                              n                =                0                                            N                -                1                                      ⁢                                                  ⁢                                          x                ⁡                                  [                  n                  ]                                            ⁢                              sin                (                                                                            π                      ⁡                                              (                                                                              n                            +                            0                                                    ,                          5                                                )                                                              ⁢                                                                                  ⁢                                          (                                                                        k                          +                          0                                                ,                        5                                            )                                                        N                                )                                                    ,                                  ⁢                  ∀                      k            ∈                          [                              0                ,                                  N                  -                  1                                            ]                                                          (        2.18        )            
When a signal is transformed using both DCT-IV and DST-IV, the complex spectrum formed by the combination of the two real spectra again contains information on absolute value and phase. The frequency resolution here is still double as high as in DFT, which means that N frequencies are mapped in the interval [0, π] (Malvar, 1992).
For a signal processing of long audio signals, it is not possible to transform the signal as a whole. On the one hand, the calculating complexity here increases enormously since N2 calculating operations are necessitated also for calculating the DCT. On the other hand, it is not possible to process the signal in real time since transmission of the entire data stream has to be waited for until the signal may be reconstructed. Consequently, the signal has to be divided into blocks. In this case, the DCT is applied as a so-called block transform (Rao and Yip, 2001). Using the block index bε, the following results for the DCT-IV from equation 2.14d:
                                                        X              b              IV                        ⁡                          [              k              ]                                =                                    ∑                              n                =                0                                            N                -                1                                      ⁢                                                  ⁢                                          x                ⁡                                  [                                      n                    +                    bN                                    ]                                            ⁢                              cos                (                                                                            π                      ⁡                                              (                                                                              n                            +                            0                                                    ,                          5                                                )                                                              ⁢                                                                                  ⁢                                          (                                                                        k                          +                          0                                                ,                        5                                            )                                                        N                                )                                                    ,                                  ⁢                  ∀                      k            ∈                          [                              0                ,                                  N                  -                  1                                            ]                                                          (        2.19        )            
The signal length of x[n] corresponds to bN. With block transforms, block artefacts arise due to quantization. A known example where artefacts of this kind may be recognized is the JPEG compression method. The block artefacts originate from the edge continuations to be performed for periodizing. They do not correspond to the originally assumed signal continuations (cf. equation 2.16). The result are jumps at the block limits which in the frequency domain shift the energy towards high frequencies (Malvar, 1992). Jumps in an audio signal may be perceived as crackles. Human hearing is very sensitive towards such artefacts. Thus, they have to be absolutely avoided.
The modified discrete cosine transform will be discussed below.
The modified discrete cosine transform (MDCT) is the central transform for audio compression. It is used, among others, in mp3, AAC and Dolby Digital (ac-3). MDCT is a real, discrete, linear and orthogonal transform and a modification of DCT-IV. It is defined as follows (Rao and Yip, 2001):
                                                        X              b                        ⁡                          [              k              ]                                =                                    ∑                              n                =                0                                                              2                  ⁢                                                                          ⁢                  N                                -                1                                      ⁢                                                  ⁢                                          x                ⁡                                  [                                      n                    +                    bN                                    ]                                            ⁢                              cos                (                                                                            π                      (                                                                        n                          +                          0                                                ,                                                  5                          -                                                      N                            2                                                                                              )                                        ⁢                                          (                                                                        k                          +                          0                                                ,                        5                                            )                                                        N                                )                                                    ,                                  ⁢                  ∀                      k            ∈                          [                              0                ,                                  N                  -                  1                                            ]                                                          (        2.20        )            
An advantage of MDCT compared to DCT-IV is avoiding block artefacts. This can be achieved mainly by the overlapping of several successive blocks. This kind of transform is also known as lapped orthogonal transform (LOT) (Malvar and Staelin, 1989).
The redundancy may be removed again by the overlap-add (OLA) method. Thus, the blocks forming in the inverse transform are overlapped up to 50% and added up, this procedure being referred to as overlap-add.
The frequency resolution of MDCT may be improved further by weighting the input sequence x[n+bN] with a window function. In equation 2.20, the window corresponds to a rectangular function clipping the current block b from the overall signal. In the frequency domain, this corresponds to convolution (folding) using the si function. The poor stop band attenuation of the si function can be improved by adapting this window function and thus an increased frequency selectivity can be achieved. In order for the MDCT to be able to reconstruct perfectly, the window function w[n] of a length 2N has to fulfill the Princen-Bradley (PR) conditions (Princen et al., 1987):w[n]=w[2N−1−n]  (2.21a)w2[n]+w2[n+N]=1  (2.21b)
A simple window fulfilling these conditions and exhibiting sufficient stop band attenuation is the sine half wave window. It is used, among others, in mp3 and AAC and is defined as follows (Malvar, 1992):
                                          w            ⁡                          [              n              ]                                =                      sin            ⁡                          (                                                π                  ⁡                                      (                                                                  n                        +                        0                                            ,                      5                                        )                                                                    2                  ⁢                                                                          ⁢                  N                                            )                                      ,                                  ⁢                  ∀                      n            ∈                          [                              0                ,                                                      2                    ⁢                                                                                  ⁢                    N                                    -                  1                                            ]                                                          (        2.22        )            
By inserting the window function w[2N−1−n] into equation 2.20, another important feature of MDCT can be recognized. The result corresponds to the discrete convolution of x[n+bN] using the modulated window function w[n]. Thus, for ∀kε[0,N−1], the following results (Schuller and Smith, 1996):
                                          X            k                    ⁡                      [            b            ]                          =                              ∑                          n              =              0                                                      2                ⁢                                                                  ⁢                N                            -              1                                ⁢                                          ⁢                                    x              ⁡                              [                                  n                  +                  bN                                ]                                      ⁢                          w              ⁡                              [                                                      2                    ⁢                                                                                  ⁢                    N                                    -                  1                  -                  n                                ]                                      ⁢                          cos              (                                                                    π                    (                                                                  n                        +                        0                                            ,                                              5                        -                                                  N                          2                                                                                      )                                    ⁢                                      (                                                                  k                        +                        0                                            ,                      5                                        )                                                  N                            )                                                          (        2.23        )            
Thus, the MDCT cannot only be seen to be a block transform, but also a modulated filter bank (Malvar, 1992). Thus, the window function corresponds to the low-pass prototype FIR filter which is modulated by the cosine kernel and thus represents the frequency bands of the filter bank. The result of this is that the input sequence x[n+bN] is decomposed into exactly N subbands. In connection with the TDA feature, the MDCT fulfills the preconditions of a so-called “critically sampled filter bank”.
Such a critically sampled filter bank is illustrated in FIG. 25. In particular, FIG. 25 shows an N-band critically sampled PR filter bank with a system delay of nd samples. Such filter banks are of particular importance for audio encoding since they describe a signal as precisely and completely as possible with the smallest number of samples (Rao and Yip, 2001).
The symbol ↓N corresponds to a reduction in the sample rate by the factor 1/N and ↑N to an increase by the factor N. The signal after the synthesis filter bank {circumflex over (x)}[n]=x[n−nd] is identical to the input signal x[n] before the analysis filter bank, except for a constant delay of nd samples. In the case of MDCT, hk[n] is the modulated window function wk[n]. Since W[n] fulfills the PR conditions, the analysis filters hk are identical to the synthesis filters gk.
From a mathematical point of view, it is suitable to formulate linear equation systems, including all the transforms mentioned so far, in the vector matrix notation. A signal x[n] of a length bN is represented as a column vector x=[x[0], x[1], . . . , x[bN−1]]T. The operator T here characterizes the transposition. Forming a block may be represented as a matrix in which every column of the matrix contains a block of x[n]:
                              x          ~                =                  [                                                                      x                  ⁡                                      [                    0                    ]                                                                                                x                  ⁡                                      [                    N                    ]                                                                              …                                                              x                  ⁡                                      [                                                                  (                                                  b                          -                          1                                                )                                            ⁢                      N                                        ]                                                                                                                        x                  ⁡                                      [                    1                    ]                                                                                                x                  ⁡                                      [                                          N                      +                      1                                        ]                                                                              …                                                              x                  ⁡                                      [                                                                                            (                                                      b                            -                            1                                                    )                                                ⁢                        N                                            +                      1                                        ]                                                                                                      ⋮                                            ⋱                                            …                                            ⋮                                                                                      x                  ⁡                                      [                                          N                      -                      1                                        ]                                                                                                x                  ⁡                                      [                                                                  2                        ⁢                                                                                                  ⁢                        N                                            -                      1                                        ]                                                                              …                                                              x                  ⁡                                      [                                          bN                      -                      1                                        ]                                                                                ]                                    (        2.24        )            
The transform rule may also be represented as a matrix. The modulated window functions here form the lines of the matrix. For ∀kε[0,N−1], ∀nε[0, 2N−1], the following applies:
                                                        T              ~                        MDCT                    ⁡                      (                          k              ,              n                        )                          :=                                            w              ⁡                              [                n                ]                                      ⁢                          cos              (                                                                    π                    (                                                                  n                        +                        0                                            ,                                              5                        -                                                  N                          2                                                                                      )                                    ⁢                                      (                                                                  k                        +                        0                                            ,                      5                                        )                                                  N                            )                                ∈                      ℝ                          N              ×              2              ⁢                                                          ⁢              N                                                          (        2.25        )            
In order to be able to calculate the MDCT of x, the block structure of have {tilde under (x)} to be extended by a 50% overlap for the TDA. Thus, the MDCT may be written as follows:{tilde under (X)}={tilde under (T)}MDCT·{tilde under (x)}TDA  (2.26)with:
                                          x            ~                    TDA                =                  [                                                                      x                  ⁡                                      [                    0                    ]                                                                                                x                  ⁡                                      [                    N                    ]                                                                              …                                                              x                  ⁡                                      [                                                                  (                                                  b                          -                          2                                                )                                            ⁢                      N                                        ]                                                                                                                        x                  ⁡                                      [                    1                    ]                                                                                                x                  ⁡                                      [                                          N                      +                      1                                        ]                                                                              …                                                              x                  ⁡                                      [                                                                                            (                                                      b                            -                            1                                                    )                                                ⁢                        N                                            +                      1                                        ]                                                                                                      ⋮                                            ⋮                                            ⋱                                            ⋮                                                                                      x                  ⁡                                      [                                                                  2                        ⁢                                                                                                  ⁢                        N                                            -                      1                                        ]                                                                                                x                  ⁡                                      [                                                                  3                        ⁢                                                                                                  ⁢                        N                                            -                      1                                        ]                                                                              …                                                              x                  ⁡                                      [                                          bN                      -                      1                                        ]                                                                                ]                                    (        2.27        )            
Every column of {tilde under (x)} forms the MDCT spectrum of the respective block with an index b in x.
For calculating a block, this form of the MDCT necessitates 2N2 multiplications and additions. However, the calculating complexity can be reduced considerably.
Thus, the filter bank in FIG. 25 has to be transferred to an equivalent polyphase filter bank (see FIG. 26). Using the polyphase representation and the z-transform, multirate systems, like the MDCT filter bank, may be analyzed more extensively.
An FIR filter h[n] can be divided into Mε phases when the length of the filter corresponds to an integer multiple of M. The mth phase pm[n] of h[n] is produced by delaying n[n] by z−m and reducing the sample rate by the factor M (Malvar, 1992). The following applies:pm[n]=h[nM+m]  (2.28)
Using the decomposition and the z-transform, the filter h[n] may be represented as follows (Malvar, 1992):
                              H          ⁡                      [            z            ]                          =                              ∑                          n              =              0                                      MN              -              1                                ⁢                                          ⁢                                    h              ⁡                              [                n                ]                                      ⁢                          z                              -                n                                                                        (        2.29        )                                =                              ∑                          m              =              0                                      M              -              1                                ⁢                                          ⁢                                    z                              -                m                                      ⁢                                          ∑                                  n                  =                  0                                                  N                  -                  1                                            ⁢                                                          ⁢                                                h                  ⁡                                      [                                          nM                      +                      m                                        ]                                                  ⁢                                  z                                      -                    nM                                                                                                          (        2.30        )            
Instead of sum notation, vector notation is of advantage here as well. Equation 2.30 may thus be represented as an N-dimension vector:
                              H          _                =                  [                                                                                          H                    0                                    ⁡                                      [                    z                    ]                                                                                                                                            H                    1                                    ⁡                                      [                    z                    ]                                                                                                      ⋮                                                                                                          H                                          N                      -                      1                                                        ⁡                                      [                    z                    ]                                                                                ]                                    (        2.31        )            with:
                                          H            n                    ⁡                      [            z            ]                          =                              ∑                          m              =              0                                      M              -              1                                ⁢                                          ⁢                                    h              ⁡                              [                                  nM                  +                  m                                ]                                      ⁢                          z                                                -                  nM                                -                m                                                                        (        2.32        )            
This polyphase decomposition may then be applied to each filter of the MDCT filter bank. The result is the equivalent polyphase representation of the filter bank, mentioned before, in FIG. 26 (Schuller and Smith, 1996). Thus, FIG. 26 represents an equivalent N-band critically sampled PR polyphase filter bank.
By making use of symmetries in the MDCT kernel and the TDA feature, the analysis and synthesis polyphase filter matrices {tilde under (P)}a and {tilde under (P)}s may each be divided into a weakly occupied folding (convolution) matrix and a transform matrix (Schuller and Smith, 1996). The folding matrices {tilde under (F)}a and {tilde under (F)}s here exhibit a diamond structure with the coefficients of the window function w[n] as polynomials in the z-domain. They may be decomposed further to a window matrix and a delay matrix:{tilde under (F)}a={tilde under (D)}·{tilde under (F)}  (2.33a){tilde under (F)}s={tilde under (F)}T·{tilde under (D)}−1  (2.33b)
The precise form and splitting of the folding matrices will be shown further below. The transform matrices correspond to the DCT-IV matrix:
                                          T            ~                    ⁡                      (                          k              ,              n                        )                          :=                              cos            (                                                            π                  ⁡                                      (                                                                  n                        +                        0                                            ,                      5                                        )                                                  ⁢                                  (                                                            k                      +                      0                                        ,                    5                                    )                                            N                        )                    ∈                      ℝ                          N              ×              N                                                          (                  2.34          ⁢          a                )                                                      T            ~                                -            1                          =                              2            N                    ·                      T            ~                                              (                  2.34          ⁢          b                )            
Using these matrices, the MDCT spectrum of the input signal divided into blocks {tilde under (x)} is calculated as follows (Schuller and Smith, 1996):{tilde under (X)}={tilde under (T)}·{tilde under (D)}·{tilde under (F)}·{tilde under (x)}  (2.35)wherein the following applies for the inverse transform:{tilde under ({circumflex over (x)})}={tilde under (F)}T·{tilde under (D)}−1·{tilde under (T)}−1·{tilde under (X)}  (2.36)
This solution offers several advantages compared to calculating the MDCT in accordance with equation 2.26. At first, the time domain aliasing forming may be recognized more easily. With the polyphase representation of the folding matrix in equation 2.33a, the process may be interpreted to be turning over weighted signal portions of block (b−1) to the current block b. By adding these signal portions, the TDA forms. The greatest advantage of calculating the MDCT using polyphases is the considerably reduced calculating complexity. By using the squared DCT-IV matrix and the sparsely occupied folding matrix, the calculating complexity is reduced to N(N+2) multiplications and additions. By using fast implementations of the DCT, similarly to FFT, the number of operations necessitated may be decreased down to N(log N+2) and thus the complexity be reduced to ο(N log N) (Rao and Yip, 2001). For these reasons, the MDCT here is considered to be implemented in accordance with to the polyphase approach.
In audio signal processing, it may be necessary to shift a signal of a low frequency to higher frequencies, wherein said frequency shift should be freely selectable and precise. Audio encoders which try to restore the higher frequencies of a signal have to face this problem. Modern audio encoding technologies use methods of bandwidth extension for a more efficient compression of audio data. Apart from the psycho-acoustic features of human hearing, the correlation of the low-frequency signal portions to the high-frequency portions is made use of for data reduction.
Despite the existence of various ways of reducing the data rate by audio encoding, current audio encoders reach their limits when low bit rates are desired. In particular the psycho-acoustic methods in this case produce undesired signal corruptions. This may be seen in interfering artefacts, like missing heights, blurred transients or artificial hissing of the audio signal reproduced. In many cases of application, however, only a limited transmission bandwidth is available. Bandwidth extension (BWE) offers a solution for these problems. Generally, bandwidth extension unites a number of methods using which a band-limited audio signal may be extended spectrally in order to again obtain the original bandwidth. All in all, four categories of methods for bandwidth extension are differentiated between (Larsen and Aarts, 2004). These are illustrated graphically in FIG. 27.
FIG. 27 shows categories of bandwidth extension (Larsen and Aarts, 2004). In FIG. 27, low-frequency psycho-acoustic BWE is shown at the top left. In FIG. 27, high-frequency psycho-acoustic BWE is shown at the top right. FIG. 27 shows low-frequency BWE at the bottom left. In addition, high-frequency BWE is illustrated in FIG. 27 at the bottom right. The energy of the band ‘a’ (broken line) is shifted to the band ‘b’ (dotted line).
Only category III (bottom right in FIG. 27) is useful for audio encoding. With the so-called “high-frequency BWE”, the frequencies present in the band-limited signal are used in order to reconstruct the high-frequency range of the spectrum. The idea of using such a method for bandwidth extension of audio signals is based on the fact that there is a strong correlation between the high-frequency and low-frequency portions of the signal. Thus, it is possible to reconstruct the missing high frequencies from the low signal portions present (Larsen and Aarts, 2004). Current techniques and methods, using which a band-limited signal may be extended to its original bandwidth by means of high-frequency BWE will be presented below.
Spectral band replication (SBR) is known from known technology, as is, among others, employed in HE-AAC. With spectral band replication with SBR, correlations between low-frequency and high-frequency signal portions are made use of in order to expand the low-pass signal provided by the encoder spectrally. The low frequency bands of the underlying filter bank are copied to the missing high bands and the spectral envelope is adapted. This copying process causes, in particular with low cutoff frequencies, perceivable artefacts like roughness and undesired changes in timbre. These are caused mainly by the missing harmonic continuation of the spectrum at the limit between the baseband and the algorithmically produced high frequency bands.
A known SBR audio encoder uses pQMF subband decomposition of the signal and in this way ensures high encoding efficiency [Eckstrand 2002]. This is achieved by transmitting only the lower frequency bands, whereas the higher frequency portions are reconstructed using side information and the frequency shift of the lower bands mentioned before.
Spectral band replication at present is the most widespread method for bandwidth extension. It is, among others, employed in HE-AAC and mp3PRO. SBR has been developed by Coding Technologies, with the goal of increasing the efficiency of existing audio encoders. This is achieved by processing, by an encoder, only frequencies below a certain edge frequency fg. In the examples mentioned, mp3 and AAC encoders are used as core encoders. Frequencies above the edge frequency are described only by a few parameters. Depending on the quality to be achieved, these are between 5 kHz and 13 kHz. The high frequency portions are then reconstructed in the receiver using said side information and the decoded band-limited signal (Ekstrand, 2002).
FIG. 28 shows the block circuit diagram of an extended SBR encoder. The sample rate of the input signal is reduced and subsequently fed to the actual encoder. In parallel, the signal is analyzed by a complex quadrature mirror filter bank (QMF) and an energy calculation is performed. The QMF used consists of 64 subbands. The parameters necessitated for estimating the spectral envelopes may be derived from this. Further parameters allow reacting to the special characteristics of the input signal. By knowing the SBR encoder, it may recognize strong differences between the original and the synthesized high-frequency portion (HF) by producing the high frequency band.
When, for example, strongly distinct individual sounds above the cutoff frequency are present in the signal, these are described by additional parameters and may be fed again to the reconstructed signal. The side information produced is inserted into the outgoing bit stream, apart from the actual audio data (Larsen and Aarts, 2004).
FIG. 29 shows the block circuit diagram of the respective decoder extended by SBR. The band-limited audio data are decoded by the decoder and the control parameters are extracted from the bit stream. Subsequently, the audio data are fed again to a QMF filter bank for reconstructing the high frequency portions. The baseband is copied within this filter bank and inserted above the cutoff frequency (cf. FIG. 30, left).
FIG. 30 is a schematic illustration of the absolute value frequency response. Thus, FIG. 30 is a schematic illustration of SBR-HF reconstruction. FIG. 30 shows copying and shifting the baseband on the left. FIG. 30 illustrates a spectrum after adjusting the spectral envelope on the right.
The information, produced in the SBR encoder, on the spectral envelope is used to match the envelope of the copied spectrum to the original one. This adaptation is done using the control parameter transmitted and the energy of the respective QMF band. If the features of the reconstructed spectrum differ from the original ones, additionally tonal components or noise will be added to the signal (Larsen and Aarts, 2004). FIG. 30 shows the adapted reconstructed spectrum on the right.
Finally, the band-limited signal and the reconstructed high-frequency signal are merged and transferred to the time domain by the synthesis filter bank. In this way, a bandwidth-extended signal which is now ready for reproduction has formed.
In this kind of bandwidth extension, problems arise with highly tonal signals of a highly distinct harmonic structure. Even if the SBR method provides for techniques for tonal adaptation of the spectrum, these are not sufficient for restoring a destroyed harmonic structure. The result is a perceivable roughness in the signal (Wilde, 2009). These artefacts are very unpleasant for the listener. This originates from the copying process of the SBR decoder. This does not take into consideration the harmonic fine structure of the signal and simply replicates the baseband. The result is shown in FIG. 31.
FIG. 31 shows a destruction of the harmonic structure with SBR. FIG. 31 shows an original broad-band spectrum on the left. FIG. 31 shows a spectrum after SBR HF reconstruction on the right.
It is clearly recognizable that the harmonics are shifted relative to the original spectrum in the range above the cutoff frequency. The reconstructed HF spectrum is harmonic, but the harmonic structure is spread by an additional frequency swing tag at the cutoff frequency. Additionally, the amplitude ratios of harmonic sub-tones are distorted by reconstructing the envelope. This effect will occur with all harmonic signals, as are exemplarily generated by musical instruments.
For harmonic signals, such as, for example, a pitch pipe, SBR and equivalent bandwidth extension methods produce undesired artefacts, such as, for example, tonal roughness and unpleasant timbres, since the harmonic structure of the signal is not maintained completely. For signals exhibiting a distinct harmonic structure, undesired artefacts, like roughness and changes in timbre, occur when applying SBR.
This is why two time-domain bandwidth extension methods which contain these structures have been developed: phase vocoder-controlled harmonic bandwidth extension (HBE) and continuous modulation (CM) BWE which uses special sideband modulation [Nagel and Disch 2009], [Nagel et al. 2010]. Due to the continuous modulation with freely selectable frequencies, in particular CM-BWE achieves good harmonic restoring.
There are some alternative bandwidth extension methods which avoid the problem of disharmonic spectral continuation. Two of these methods will be introduced below. Basically, these methods replace the HF generator of the SBR decoder in FIG. 29 and thus represent an alternative to the simple copying process. Adapting the spectral envelope and tonality remains unchanged. Since the input signal has to be in the time domain, this method is also referred to as the time domain method for bandwidth extension.
Harmonic bandwidth extension (HBE) is to be mentioned at first. HBE uses a phase vocoder for producing the high-pitch range. The spectrum is expanded by applying a phase vocoder. As is shown on the left in FIG. 32, the baseband is spread up to the maximum signal frequency fmax and the frequency range between the cutoff frequency and fmax is clipped out. The spectrum is then composed of said portion and the baseband (cf. FIG. 32, right). The envelope is adapted, as is also done in SBR (Nagel and Disch, 2009).
FIG. 32 is a schematic illustration of HBE-HF reconstruction. FIG. 32 shows expansion of the baseband by the factor 2 on the left. FIG. 32 shows a spectrum after having adapted the spectral envelope on the right.
Using integral expansion factors σε+ ensures that the cutoff frequency fg does not change the harmonic structure. The following applies:fmax=σ·fg  (3.1)
Of disadvantage is the fact that the distance between the sub-tones in the HF region changes with the expansion factor by spreading the spectrum, as can be seen in FIG. 33. In addition, complicated calculations are necessitated for spreading the spectrum. Among these are high-resolution DFT, phase adaptation and sample rate conversion (Dolson, 1986). When the audio signal is subdivided into blocks, additionally an overlap-add structure is needed in order to be able to continue the phase of neighboring blocks continuously. For highly tonal signals, very good results can be achieved using the phase vocoder technique, however in percussive signals the transients blur and performing a separate transient treatment becomes necessary (Wilde, 2009).
FIG. 33 shows a harmonic structure with HBE. FIG. 33 shows an original broad-band spectrum on the left. FIG. 33 illustrates a spectrum after HBE HF reconstruction on the right.
Continuous single sideband modulation will be presented below.
Continuously modulated bandwidth extension (CM-BWE) is another time-domain method for bandwidth extension. In this method, the baseband is modulated by the frequency fmod by means of single sideband modulation and thus shifted to another spectral location, as is illustrated in FIG. 34. A variable modulation frequency ensures the harmonic structure of the bandwidth-extended signal to be maintained. With modulation frequencies greater than the cutoff frequency fg, the gap forming in the spectrum has to be filled with noise (Nagel et al., 2010).
FIG. 34 shows a schematic illustration of CM-BWE-HF reconstruction. FIG. 34 shows modulation of the baseband with the frequency fmod on the left. FIG. 34 shows a spectrum after adapting the spectral envelope on the right.
Apart from the case illustrated in FIG. 34, it may also be necessary for the baseband to be modulated several times. In such a case, the modulation frequency has to be adapted for every modulation in which its respective next integral multiple is selected (Nagel et al., 2010). Before modulation, the baseband has to be filtered by a low-pass in accordance with the modulation frequency, in order for the maximum allowed signal frequency fmax not to be exceeded after modulation. Similarly to the methods already presented, subsequently the spectral envelope is formed and the tonality adapted.
FIG. 35 shows the harmonic structure as it forms in a signal extended by means of CM-BWE. FIG. 35 shows an original broad-band spectrum on the left. FIG. 35 shows a spectrum after CM-BWE-HF reconstruction on the right. Like in the HBE method, CM-BWE lacks a harmonic sub-tone in the spectrum. However, this does not attract attention in a negative way, since the harmonic structure itself is maintained.
Of disadvantage with this method is calculating the single sideband modulation. An analytical signal is necessitated for correct calculation, i.e. a signal containing only positive frequencies. A Hilbert transformer is needed for calculating such a signal. This basically is a non-causal filter of infinite impulse response. Such a filter cannot be realized and has to be simplified. In order to nevertheless achieve the highest possible stop band attenuation with a minimal filter order, a non-negligible delay is added to the signal by causalization of the filter (Wilde, 2009).
However, when the frequency shift is realized in the time domain, this may be very complex. Realizing the shift in the subband domain of a subband audio encoder in contrast may result in the frequency resolution to be too coarse for the frequency shift needed.
What is desired is minimizing the memory space of the digital data necessitated or the bandwidth necessitated for transmitting said data by encoding audio signals. At the same time, the perceived quality of the reproduced audio signal is to be comparable to the CD standard (sampling frequency 44100 Hz at a quantization depth of 16 bits). Thus, the quality is to be maximized at a decreasing data rate.