The present invention relates to the audio signal processing, and in particular, to the audio signal processing in situations in which the available data rate is rather small.
The hearing adapted encoding of audio signals for a data reduction for an efficient storage and transmission of these signals have gained acceptance in many fields. Encoding algorithms are known, in particular, as “MP3” or “MP4”. The coding used for this, in particular when achieving lowest bit rates, leads to the reduction of the audio quality which is often mainly caused by an encoder side limitation of the audio signal bandwidth to be transmitted.
It is known from WO 98 57436 to subject the audio signal to a band limiting in such a situation on the encoder side and to encode only a lower band of the audio signal by means of a high quality audio encoder. The upper band, however, is only very coarsely characterized, i.e. by a set of parameters which reproduces the spectral envelope of the upper band. On the decoder side, the upper band is then synthesized. For this purpose, a harmonic transposition is proposed, wherein the lower band of the decoded audio signal is supplied to a filterbank. Filterbank channels of the lower band are connected to filterbank channels of the upper band, or are “patched”, and each patched bandpass signal is subjected to an envelope adjustment. The synthesis filterbank belonging to a special analysis filterbank here receives bandpass signals of the audio signal in the lower band and envelope-adjusted bandpass signals of the lower band which were harmonically patched in the upper band. The output signal of the synthesis filterbank is an audio signal extended with regard to its bandwidth, which was transmitted from the encoder side to the decoder side with a very low data rate. In particular, filterbank calculations and patching in the filterbank domain may become a high computational effort.
Complexity-reduced methods for a bandwidth extension of band-limited audio signals instead use a copying function of low-frequency signal portions (LF) into the high frequency range (HF), in order to approximate information missing due to the band limitation. Such methods are described in M. Dietz, L. Liljeryd, K. Kjörling and O. Kunz, “Spectral Band Replication, a novel approach in audio coding,” in 112th AES Convention, Munich, May 2002; S. Meltzer, R. Böhm and F. Henn, “SBR enhanced audio codecs for digital broadcasting such as “Digital Radio Mondiale” (DRM),” 112th AES Convention, Munich, May 2002; T. Ziegler, A. Ehret, P. Ekstrand and M. Lutzky, “Enhancing mp3 with SBR: Features and Capabilities of the new mp3PRO Algorithm,” in 112th AES Convention, Munich, May 2002; International Standard ISO/IEC 14496-3:2001/FPDAM 1, “Bandwidth Extension,” ISO/IEC, 2002, or “Speech bandwidth extension method and apparatus”, Vasu Iyengar et al. U.S. Pat. No. 5,455,888.
In these methods no harmonic transposition is performed, but successive bandpass signals of the lower band are introduced into successive filterbank channels of the upper band. By this, a coarse approximation of the upper band of the audio signal is achieved. This coarse approximation of the signal is then in a further step approximated to the original by a post processing using control information gained from the original signal. Here, e.g. scale factors serve for adapting the spectral envelope, an inverse filtering and the addition of a noise carpet for adapting tonality and a supplementation by sinusoidal signal portions, as it is also described in the MPEG-4 Standard.
Apart from this, further methods exist such as the so-called “blind bandwidth extension”, described in E. Larsen, R. M. Aarts, and M. Danessis, “Efficient high-frequency bandwidth extension of music and speech”, In AES 112th Convention, Munich, Germany, May 2002 wherein no information on the original HF range is used. Further, also the method of the so-called “Artificial bandwidth extension”, exists which is described in K. Käyhkö, A Robust Wideband Enhancement for Narrowband Speech Signal; Research Report, Helsinki University of Technology, Laboratory of Acoustics and Audio signal Processing, 2001.
In J. Makinen et al.: AMR-WB+: a new audio coding standard for 3rd generation mobile audio services Broadcasts, IEEE, ICASSP '05, a method for bandwidth extension is described, wherein the copying operation of the bandwidth extension with an up-copying of successive bandpass signals according to SBR technology is replaced by mirroring, for example, by upsampling.
Further technologies for bandwidth extension are described in the following documents. R. M. Aarts, E. Larsen, and O. Ouweltjes, “A unified approach to low- and high frequency bandwidth extension”, AES 115th Convention, New York, USA, October 2003; E. Larsen and R. M. Aarts, “Audio Bandwidth Extension—Application to psychoacoustics, Signal Processing and Loudspeaker Design”, John Wiley & Sons, Ltd., 2004; E. Larsen, R. M. Aarts, and M. Danessis, “Efficient high-frequency bandwidth extension of music and speech”, AES 112th Convention, Munich, May 2002; J. Makhoul, “Spectral Analysis of Speech by Linear Prediction”, IEEE Transactions on Audio and Electroacoustics, AU-21(3), June 1973; U.S. patent application Ser. No. 08/951,029; U.S. Pat. No. 6,895,375.
Known methods of harmonic bandwidth extension show a high complexity. On the other hand, methods of complexity-reduced bandwidth extension show quality losses. In particular with a low bitrate and in combination with a low bandwidth of the LF range, artifacts such as roughness and a timber perceived to be unpleasant may occur. A reason for this is the fact that the approximated HF portion is based on a copying operation which leaves harmonic relations of the tonal signal portions unnoticed with regard to each other. This applies both, to the harmonic relation between LF and HF, and also to the harmonic relation within the HF portion itself. With SBR, for example, at the boundary between LF range and the generated HF range, occasionally rough sound impressions occur, as tonal portions copied from the LF range into the HF range, as for example illustrated in FIG. 4a, may now in the overall signal encounter tonal portions of the LF range as to be spectrally densely adjacent. Thus, in FIG. 4a, an original signal with peaks at 401, 402, 403, and 404 is illustrated, while a test signal is illustrated with peaks at 405, 406, 407, and 408. By copying tonal portions from the LF range into the HF range, wherein in FIG. 4a the boundary was at 4250 Hz, the distance of the two left peaks in the test signal is less than the base frequency underlying the harmonic raster, which leads to a perception of roughness.
As the width of tone-compensated frequency groups increases with an increase of the center frequency, as it is described in Zwicker, E. and H. Fastl (1999), Psychoacoustics: Facts and models. Berlin—Springerverlag, sinusoidal portions lying in the LF range in different frequency groups, by copying into the HF range, may come to lie in the same frequency group here, which also leads to a rough hearing impression as it may be seen in FIG. 4b. Here it is in particular shown that copying the LF range into the HF range leads to a denser tonal structure in the test signal as compared to the original. The original signal is distributed relatively uniformly across the spectrum in the higher frequency range, as it is in particular shown at 410. In contrast, in particular in this higher range, the test signal 411 is distributed relatively non-uniformly across the spectrum and thus clearly more tonal than the original signal 410.