HFR technologies, such as the Spectral Band Replication (SBR) technology, allow to significantly improve the coding efficiency of traditional perceptual audio codecs. In combination with MPEG-4 Advanced Audio Coding (AAC) it forms a very efficient audio codec, which is already in use within the XM Satellite Radio system and Digital Radio Mondiale. The combination of AAC and SBR is called aacPlus. It is part of the MPEG-4 standard where it is referred to as the High Efficiency AAC Profile. In general, HFR technology can be combined with any perceptual audio codec in a back and forward compatible way, thus offering the possibility to upgrade already established broadcasting systems like the MPEG Layer-2 used in the Eureka DAB system. HFR transposition methods can also be combined with speech codecs to allow wide band speech at ultra low bit rates.
The basic idea behind HRF is the observation that usually a strong correlation between the characteristics of the high frequency range of a signal and the characteristics of the low frequency range of the same signal is present. Thus, a good approximation for the representation of the original input high frequency range of a signal can be achieved by a signal transposition from the low frequency range to the high frequency range.
This concept of transposition was established in WO 98/57436, as a method to recreate a high frequency band from a lower frequency band of an audio signal. A substantial saving in bit-rate can be obtained by using this concept in audio coding and/or speech coding. In the following, reference will be made to audio coding, but it should be noted that the described methods and systems are equally applicable to speech coding and in unified speech and audio coding (USAC).
In a HFR based audio coding system, a low bandwidth signal is presented to a core waveform coder and the higher frequencies are regenerated at the decoder side using transposition of the low bandwidth signal and additional side information, which is typically encoded at very low bit-rates and which describes the target spectral shape. For low bit-rates, where the bandwidth of the core coded signal is narrow, it becomes increasingly important to recreate a high band, i.e. the high frequency range of the audio signal, with perceptually pleasant characteristics. Two variants of harmonic frequency reconstruction methods are mentioned in the following, one is referred to as harmonic transposition and the other one is referred to as single sideband modulation.
The principle of harmonic transposition defined in WO 98/57436 is that a sinusoid with frequency ω is mapped to a sinusoid with frequency Tω where T>1 is an integer defining the order of the transposition. An attractive feature of the harmonic transposition is that it stretches a source frequency range into a target frequency range by a factor equal to the order of transposition, i.e. by a factor equal to T. The harmonic transposition performs well for complex musical material. Furthermore, harmonic transposition exhibits low cross over frequencies, i.e. a large high frequency range above the cross over frequency can be generated from a relatively small low frequency range below the cross over frequency.
In contrast to harmonic transposition, a single sideband modulation (SSB) based HFR maps a sinusoid with frequency ω to a sinusoid with frequency ω+Δω where Δω is a fixed frequency shift. It has been observed that, given a core signal with low bandwidth, a dissonant ringing artifact may result from the SSB transposition. It should also be noted that for a low cross-over frequency, i.e. a small source frequency range, harmonic transposition will require a smaller number of patches in order to fill a desired target frequency range than SSB based transposition. By way of example, if the high frequency range of (ω,4ω] should be filled, then using an order of transposition T=4 harmonic transposition can fill this frequency range from a low frequency range of (¼ω,ω]. On the other hand, a SSB based transposition using the same low frequency range must use a frequency shift of Δω=¾ω and it is necessary to repeat the process four times in order to fill the high frequency range (ω,4ω].
On the other hand, as already pointed out in WO 02/052545 A1, harmonic transposition has drawbacks for signals with a prominent periodic structure. Such signals are superimpositions of harmonically related sinusoids with frequencies Ω, 2Ω, 3Ω, . . . , where Ω is the fundamental frequency. Upon harmonic transposition of order T, the output sinusoids have frequencies TΩ, 2TΩ, 3TΩ, . . . , which, in case of T>1, is only a strict subset of the desired full harmonic series. In terms of resulting audio quality a “ghost” pitch corresponding to the transposed fundamental frequency TΩ will typically be perceived. Often the harmonic transposition results in a “metallic” sound character of the encoded and decoded audio signal. The situation may be alleviated to a certain degree by adding several orders of transposition T=2, 3, . . . , Tmax to the HFR, but this method is computationally complex if most spectral gaps are to be avoided.
An alternative solution for avoiding the appearance of “ghost” pitches when using harmonic transposition has been presented in WO 02/052545 A1. The solution consists in using two types of transposition, i.e. a typical harmonic transposition and a special “pulse transposition”. The described method teaches to switch to the dedicated “pulse transposition” for parts of the audio signal that are detected to be periodic with pulse-train like character. The problem with this approach is that the application of “pulse transposition” on complex music material often degrades the quality compared to harmonic transposition based on a high resolution filter bank. Hence, the detection mechanisms have to be tuned rather conservatively such that pulse transposition is not used for complex material. Inevitably, single pitch instruments and voices will sometimes be classified as complex signals, hereby invoking harmonic transposition and therefore missing harmonics. Moreover, if switching occurs in the middle of a single pitched signal, or a signal with a dominating pitch in a weaker complex background, the switching itself between the two transposition methods having very different spectrum filling properties will generate audible artifacts.