In PCT WO 98/57436 the concept of transposition was established as a method to recreate a high frequency band from a lower frequency band of an audio signal. A substantial saving in bitrate can be obtained by using this concept in audio coding. In an HFR based audio coding system, a low bandwidth signal is processed by a core waveform coder and the higher frequencies are regenerated using transposition and additional side information of very low bitrate describing the target spectral shape at the decoder side. For low bitrates, where the bandwidth of the core coded signal is narrow, it becomes increasingly important to recreate a high band with perceptually pleasant characteristics. The harmonic transposition defined in PCT WO 98/57436 performs very well for complex musical material in a situation with low crossover frequency. The principle of a harmonic transposition is that a sinusoid with frequency ω is mapped to a sinusoid with frequency Tω where T>1 is an integer defining the order of transposition. In contrast to this, a single sideband modulation (SSB) based HFR method maps a sinusoid with frequency ω to a sinusoid with frequency ω+Δω where Δω is a fixed frequency shift. Given a core signal with low bandwidth, a dissonant ringing artifact can result from SSB transposition.
In order to reach the best possible audio quality, state of the art high quality harmonic HFR methods employ complex modulated filter banks, e.g. a Short Time Fourier Transform (STFT), with high frequency resolution and a high degree of oversampling to reach the audio quality that may be used. The fine resolution may be used to avoid unwanted intermodulation distortion arising from nonlinear processing of sums of sinusoids. With sufficiently high frequency resolution, i.e. narrow subbands, the high quality methods aim at having a maximum of one sinusoid in each subband. A high degree of oversampling in time may be used to avoid alias type of distortion, and a certain degree of oversampling in frequency may be used to avoid pre-echoes for transient signals. The obvious drawback is that the computational complexity can become high.
Subband block based harmonic transposition is another HFR method used to suppress intermodulation products, in which case a filter bank with coarser frequency resolution and a lower degree of oversampling is employed, e.g. a multichannel QMF bank. In this method, a time block of complex subband samples is processed by a common phase modifier while the superposition of several modified samples forms an output subband sample. This has the net effect of suppressing intermodulation products which would otherwise occur when the input subband signal consists of several sinusoids. Transposition based on block based subband processing has much lower computational complexity than the high quality transposers and reaches almost the same quality for many signals. However, the complexity is still much higher than for the trivial SSB based HFR methods, since a plurality of analysis filter banks, each processing signals of different transposition orders T, may be used in a typical HFR application in order to synthesize the bandwidth that may be used. Additionally, a common approach is to adapt the sampling rate of the input signals to fit analysis filter banks of a constant size, albeit the filter banks process signals of different transposition orders. Also common is to apply bandpass filters to the input signals in order to obtain output signals, processed from different transposition orders, with non-overlapping spectral densities.
Storage or transmission of audio signals is often subject to strict bitrate constraints. In the past, coders were forced to drastically reduce the transmitted audio bandwidth when only a very low bitrate was available. Modern audio codecs are nowadays able to code wideband signals by using bandwidth extension (BWE) methods [1-12]. These algorithms rely on a parametric representation of the high-frequency content (HF) which is generated from the low-frequency part (LF) of the decoded signal by means of transposition into the HF spectral region (“patching”) and application of a parameter driven post processing. The LF part is coded with any audio or speech coder. For example, the bandwidth extension methods described in [1-4] rely on single sideband modulation (SSB), often also termed the “copy-up” method, for generating the multiple HF patches.
Lately, a new algorithm, which employs a bank of phase vocoders [15-17] for the generation of the different patches, has been presented [13] (see FIG. 20). This method has been developed to avoid the auditory roughness which is often observed in signals subjected to SSB bandwidth extension. Albeit being beneficial for many tonal signals, this method called “harmonic bandwidth extension” (HBE) is prone to quality degradations of transients contained in the audio signal [14], since vertical coherence over sub-bands is not guaranteed to be preserved in the standard phase vocoder algorithm and, moreover, the re-calculation of the phases has to be performed on time blocks of a transform or, alternatively of a filter bank. Therefore, a need arises for a special treatment for signal parts containing transients.
However, since the BWE algorithm is performed on the decoder side of a codec chain, computational complexity is a serious issue. State-of-the-art methods, especially the phase vocoder based HBE, comes at the prize of a largely increased computational complexity compared to SSB based methods.
As outlined above, existing bandwidth extension schemes apply only one patching method on a given signal block at a time, be it SSB based patching [1-4] or HBE vocoder based patching [15-17]. Additionally, modern audio coders [19-20] offer the possibility of switching the patching method globally on a time block basis between alternative patching schemes.
SSB copy-up patching introduces unwanted roughness into the audio signal, but is computationally simple and preserves the time envelope of transients. In audio codecs employing HBE patching, the transient reproduction quality is often suboptimal. Moreover, the computational complexity is significantly increased over the computational very simple SSB copy-up method.
When it comes to a complexity reduction, sampling rates are of particular importance. This is due to the fact that a high sampling rate means a high complexity and a low sampling rate generally means low complexity due to the reduced number of operations that may be performed. On the other hand, however, the situation in bandwidth extension applications is particularly so that the sampling rate of the core coder output signal will typically be so low that this sampling rate is too low for a full bandwidth signal. Stated differently, when the sampling rate of the decoder output signal is, for example, 2 or 2.5 times the maximum frequency of the core coder output signal, then a bandwidth extension by for example a factor of 2 means that an upsampling operation may be performed so that the sampling rate of the bandwidth extended signal is so high that the sampling can “cover” the additionally generated high frequency components.
Additionally, filterbanks such as analysis filterbanks and synthesis filterbanks are responsible for a considerable amount of processing operations. Hence, the size of the filterbanks, i.e. whether the filterbank is a 32 channel filterbank, a 64 channel filterbank or even a filterbank with a higher number of channels will significantly influence the complexity of the audio processing algorithm. Generally, one can say that a high number of filterbank channel involves more processing operations and, therefore, higher complexity then a small number of filterbank channels. In view of this, in bandwidth extension applications and also in other audio processing applications, where different sampling rates are an issue, such as in vocoder-like applications or any other audio effect applications, there is a specific interdependency between complexity and sampling rate or audio bandwidth, which means that operations for upsampling or subband filtering can drastically enhance the complexity without specifically influencing the audio quality in a good sense when the wrong tools or algorithms are chosen for the specific operations.
In the context of bandwidth extension, parametric data sets are used for performing a spectral envelope adjustment and for performing other manipulations to a signal generated by a patching operation, i.e. by an operation that takes some data from the source range, i.e. from the low band portion of the bandwidth extended signal which is available at the input of the bandwidth extension processor and then maps this data to a high frequency range. Spectral envelope adjustment can take place before actually mapping the low band signal to the high frequency range or subsequently to having mapped the source range to the high frequency range.
Typically, the parametric data sets are provided with a certain frequency resolution, i.e. parametric data refer to frequency bands of the high frequency part. On the other hand, the patching from the low band to the high band, i.e. which source ranges are used for obtaining which target or high frequency ranges, is an operation independent on the resolution, in which the parametric data sets are given with respect to frequency. The fact that the transmitted parametric data are, in a sense, independent from what is actually used as the patching algorithm is an important feature, since this allows great flexibility on the decoder-side, i.e. when it comes to the implementation of the bandwidth extension processor. Here, different patching algorithms can be used, but one and the same spectral envelope adjustment can be performed. Stated differently, the high frequency reconstruction processor or spectral envelope adjustment processor in a bandwidth extension application does not need to have information on the applied patching algorithm in order to perform the spectral envelope adjustment.
A disadvantage of this procedure, however, is that a misalignment between the frequency bands, for which the parametric data sets are provided on the one hand and the spectral borders of a patch on the other hand, can occur. Particularly in situations where the spectral energy strongly changes in the vicinity of a patch border, artifacts may arise specifically in this region, which degrade the quality of the bandwidth extended signal.