Most existing telecommunication systems operate on a limited audio bandwidth. Stemming from the limitations of the land-line telephony systems, most voice services are limited to only transmitting the lower end of the spectrum. Although the limited audio bandwidth is enough for most conversations, there is a desire to increase the audio bandwidth to improve intelligibility and sense of presence. Although the capacity in telecommunication networks is continuously increasing, it is still of great interest to limit the required bandwidth per communication channel. In mobile networks smaller transmission bandwidths for each call yields lower power consumption in both the mobile device and the base station. This translates to energy and cost saving for the mobile operator, while the end user will experience prolonged battery life and increased talk-time. Further, with less consumed bandwidth per user the mobile network can service a larger number of users in parallel.
A property of the human auditory system is that the perception is frequency dependent. In particular, our hearing is less accurate for higher frequencies. This has inspired so called bandwidth extension (BWE) techniques, where a high frequency band is reconstructed from a low frequency band using a low number of transmitted parameters.
The conventional BWE uses a parametric representation of the high band signal, such as spectral envelope and temporal envelope, and reproduces the spectral fine structure of the signal by using generated noise or a modified version of the low band signal. If the high band envelope is represented by a filter, the fine structure signal is often called the excitation signal. An accurate representation of the high band envelope is perceptually more important than the fine structure. Consequently, it is common that the available resources in terms of bits are spent on the envelope representation while the fine structure is reconstructed from the coded low band signal without additional side information.
The technology of BWE has been applied in a variety of audio coding systems. For example, the 3GPP AMR-WB+ uses a time domain BWE based on a low band coder which switches between Code Excited Linear Predictor (CELP) speech coding and Transform coded excitation (TCX) coding. Another example is the 3GPP eAAC transform based audio codec which performs transform domain variant of BWE called Spectral Band Replication (SBR).
Although the split into a low band and a high band is often perceptually motivated, it may be less suitable for certain types of signals. As an example, if the high band of a particular signal is perceptually more important than the lower band, the majority of the bits spend on the lower band will be wasted while the higher band will be represented with poor accuracy. In general, if a portion of the spectrum is fixed to be encoded while other parts are not encoded, there may always be signals which do not fit the a-priori assumption. The worst scenario would be that the entire energy of the signal is contained in the non-coded part which would yield very poor performance.