When audio signals are to be stored and/or transmitted, a standard approach today is to code the audio signals into a digital representation according to different schemes. In order to save storage and/or transmission capacity, it is a general wish to reduce the size of the digital representation needed to allow reconstruction of the audio signals with sufficient quality. The trade-off between size of the coded signal and signal quality depends on the actual application.
Transform based audio coders compress audio signals by quantizing the transform coefficients. For enabling low bitrates, quantizers might concentrate the available bits on the most energetic and perceptually relevant coefficients and transmit only those, leaving “spectral holes” of unquantized coefficients in the frequency spectrum.
The so-called SBR (Spectral Band Replication) technology, see e.g. 3GPP TS 26.404 V6.0.0 (2004-09), “Enhanced aacPlus general audio codec-encoder SBR part (Release 6)”, 2004 [1], closes the gap between the band-limited signal of a conventional perceptual coder and the audible bandwidth of approximately 15 kHz. The general idea behind SBR is to recreate the missing high frequency contents of a decoded signal in a perceptually accurate manner. The frequencies above 15 kHz are less important from a psychoacoustic point of view, but may also be reconstructed. However, SBR cannot be used as a standalone codec. It always operates, in conjunction with a conventional waveform codec, a so-called core codec. The core codec is responsible for transmitting the lower part of the original spectrum while the SBR-decoder, which is mainly a post-process to the conventional waveform decoder, reconstructs the non-transmitted frequency range. The spectral values of the high band are not transmitted directly as in conventional codecs. The combined system offers a coding gain superior to the gain of the core codec alone.
The SBR methodology relies on the definition of a fixed transition frequency between a low band, encoded perceptually relevant low frequencies, and a high band, not encoded less relevant high frequencies. However, in practice, this transition frequency relies on the audio content of the original signal. In other words, from one signal to another, the appropriate transition frequency can vary a lot. This is for instance the case when comparing clean speech and full-band music signals.
The “spectral holes” of the decoded spectrum can be divided in two kinds. The first one is small holes at lower frequencies due to the effect of instantaneous masking, see e.g. J. D. Johnston, “Estimation of Perceptual Entropy Using Noise Masking Criteria”, Proc. ICASSP, pp. 2524-2527, May 1988[2]. The second one is larger holes at high frequencies resulting from the saturation by the absolute threshold of hearing and the addition of masking [2]. The SBR mainly concerns the second kind.
Moreover, a typical audio codec based on such method which aims at filling the “spectral hole”, i.e. not encoded coefficients, for the high frequencies, i.e. the second kind of “spectral holes”, should preferably be able to fill the spectral holes over the whole spectrum. Indeed, even if a SBR codec is able to deliver a full bandwidth audio signal, the reconstructed high frequencies will not mask the annoying artifacts introduced by the coding, i.e. quantization, of the low band, i.e. the perceptually relevant low frequencies.