When audio signals are to be stored and/or transmitted, a standard approach today is to code the audio signals into a digital representation according to different schemes. In order to save storage and/or transmission capacity, it is a general wish to reduce the size of the digital representation needed to allow reconstruction of the audio signals with sufficient perceptual quality. The trade-off between size of the coded signal and signal quality depends on the actual application.
A time domain signal has typically to be divided into smaller parts in order to precisely encode the evolution of the signal's amplitude, i.e. describe with low amount of information. State-of-the-art coding methods usually transform the time-domain signal into the frequency domain where a better coding gain can be reached by using perceptual coding i.e. lossy coding but ideally unnoticeable by the human auditory system. See e.g. J. D. Johnston, “Transform coding of audio signals using perceptual noise criteria”, IEEE J. Select. Areas Commun., Vol. 6, pp. 314-323, 1988 [1]. However, when the bit rate constraint is too strong, the perceptual audio coding concept can not avoid the introduction of distortions, i.e. coding noise over the masking threshold. The general issue of reducing distortions in perceptual audio coding has been addressed by the Temporal Noise Shaping (TNS) technology described in e.g. J. Herre, “Temporal Noise Shaping, Quantization and Coding Methods in Perceptual Audio Coding: A tutorial introduction”, AES 17th Int. conf. on High Quality Audio Coding, 1997 [2]. Basically, the TNS approach is based on two main considerations, namely the consideration of the time/frequency duality and the shaping of quantization noise spectra by means of open-loop predictive coding.
In addition, audio coding standards are continuously designed in order to deliver high or intermediate audio quality, from narrowband speech to fullband audio, at low data rates for a reasonable complexity according to the dedicated application. The Spectral Band Replication (SBR) technology, described in 3GPP TS 26.404 V6.0.0 (2004-09), “Enhanced aacPlus general audio codec—encoder SBR part (Release 6)”, 2004 [3], has been introduced to allow wideband or fullband audio coding at low data rate by associating specific parameters to the binary flux resulting from perceptual audio coding of the narrow band signal. Such specific parameters are typically used at the decoder side to re-generate the missing high-frequencies that is not decoded by the core codec from the low-frequency decoded spectrum.
The association of TNS and SBR technologies, described in [3], in a transform based audio codec has been successfully implemented for intermediate data rate applications, i.e. a typical bit rate of 32 kbps for intermediate audio quality. Nevertheless, these highly sophisticated coding methods are very complex since they involve predictive coding and adaptive-resolution filter bank requiring certain delays. They are indeed not well appropriated for low delay and low complexity applications.