Perceptually adapted coding of audio signals, providing a substantial data rate reduction for efficient storage and transmission of these signals, has gained wide acceptance in many fields. Many coding algorithms are known, e.g., MPEG 1/2 Layer 3 (“MP3”) or MPEG 4 AAC (Advanced Audio Coding). However, the coding used for this, in particular when operating at lowest bit rates, can lead to an reduction of subjective audio quality which is often mainly caused by an encoder side induced limitation of the audio signal bandwidth to be transmitted.
It is known from WO 98 57436 to subject the audio signal to a band limiting in such a situation on the encoder side and to encode only a lower band of the audio signal by means of a high quality audio encoder (“core coder”). The upper band, however, is only very coarsely characterized, i.e. by a set of parameters which reproduces the spectral envelope of the upper band. On the decoder side, the upper band is then synthesized. For this purpose, a harmonic transposition is proposed wherein the lower band of the decoded audio signal is supplied to a filterbank. Filterbank channels of the lower band are connected to filterbank channels of the upper band, or are “patched”, and each patched bandpass signal is subjected to an envelope adjustment. The synthesis filterbank belonging to a special analysis filterbank receives bandpass signals of the audio signal in the lower band and envelope-adjusted bandpass signals of the lower band which are harmonically patched into the upper band. The output signal of the synthesis filterbank is an audio signal extended with regard to its original bandwidth which is transmitted from the encoder side to the decoder side by the core coder operating a very low data rate. In particular, filterbank calculations and patching in the filterbank domain may become a high computational effort.
Complexity-reduced methods for a bandwidth extension of band-limited audio signals instead use a copying function of low-frequency signal portions (LF) into the high frequency range (HF) in order to approximate information missing due to the band limitation. Such methods are described in M. Dietz, L. Liljeryd, K. Kjörling and O. Kunz, “Spectral Band Replication, a novel approach in audio coding,” in 112th AES Convention, Munich, May 2002; S. Meltzer, R. Böhm and F. Henn, “SBR enhanced audio codecs for digital broadcasting such as “Digital Radio Mondiale” (DRM),” 112th AES Convention, Munich, May 2002; T. Ziegler, A. Ehret, P. Ekstrand and M. Lutzky, “Enhancing mp3 with SBR: Features and Capabilities of the new mp3PRO Algorithm,” in 112th AES Convention, Munich, May 2002; International Standard ISO/IEC 14496-3:2001/FPDAM 1, “Bandwidth Extension,” ISO/IEC, 2002, or “Speech bandwidth extension method and apparatus”, Vasu Iyengar et al. U.S. Pat. No. 5,455,888.
In these methods, no harmonic transposition is performed, but successive bandpass signals of the lower band are introduced into successive filterbank channels of the upper band. By this, a coarse approximation of the upper band of the audio signal is achieved. In a further step, this coarse approximation of the signal is then assimilated with respect to the original by a post processing using control information gained from the original signal. Here, e.g. scale factors serve for adapting the spectral envelope, an inverse filtering, and the addition of a noise floor for adapting tonality and a supplementation of sinusoidal signal portions for missing harmonics, as it is also described in the MPEG-4 High Efficiency Advanced Audio Coding (HE-AAC) standard.
Apart from this, further methods are using a phase vocoder for bandwidth extension. When applying the phase vocoder for spectral spreading, frequency lines move further apart from each other. If gaps exist in the spectrum, e.g. by quantization, the same are even increased by the spreading. In an energy adaption, remaining lines in the spectrum receive too much energy compared to the respective lines in the original signal.
FIG. 13 shows a schematic illustration of a bandwidth extension 1300 using a phase vocoder. In this example, two patches 1312, 1314 are added to a low frequency band 1302 of a signal. The upper cut-off frequency 1320 of the signal, also called Xover frequency (crossover frequency) is the low-end frequency of the neighboring patch 1312 and the double of the x-over frequency is the upper cut-off frequency of the neighboring patch 1312 and the lower cut-off frequency of the next patch 1314. The phase vocoder doubles the frequency of the frequency lines of the low frequency band 1302 of the signal to obtain the neighboring patch 1312 and triples the frequencies of the frequency lines of the low frequency band 1302 of the signal to obtain the next patch 1314. Therefore, a spectral density of the neighboring patch 1312 is only half of a spectral density of the low frequency band 1302 of the signal and the spectral density of the next patch 1314 is only one third of the spectral density of the low frequency band 1302 of the signal.
By the concentration of the energy in bands (patches) to only few frequency lines, a substantial change in timbre results which differs from the original. The energy of formerly more bands (frequency lines) is summed up to the fewer remaining ones.
Some examples for phase vocoders and their applications are presented in “Frederik Nagel and Sascha Disch, A Harmonic Bandwidth Extension Method for Audio Codecs,” ICASSP'09 and “M. Puckette. Phase-locked Vocoder. IEEE ASSP Conference on Applications of Signal Processing to Audio and Acoustics, Mohonk 1995.”, Röbel, A.: Transient detection and preservation in the phase vocoder; citeseer.ist.psu.edu/679246.html”, “Laroche L., Dolson M.: Improved phase vocoder timescale modification of audio”, IEEE Trans. Speech and Audio Processing, Vol. 7, No. 3, pp. 323-332″ and U.S. Pat. No. 6,549,884.
One approach for filling the gaps is shown in WO 00/45379. It contains a method and an apparatus for enhancement of source coding systems utilizing high frequency reconstruction. The application addresses the problem of insufficient noise contents in a reconstructed highband by adaptive noise-floor addition. Adding noise may fill the gaps, but the audio quality or subjective quality may not be increased sufficiently.