Most existing telecommunication systems operate on a limited audio bandwidth. Stemming from the limitations of the land-line telephony systems, most voice services are limited to only transmitting the lower end of the spectrum. Although the audio bandwidth is enough for most conversations, there is a desire to increase bandwidth to improve intelligibility and sense of presence. Although the capacity in telecommunication networks is continuously increasing, it is still of great interest to limit the required bandwidth per communication channel. In mobile networks smaller transmission bandwidths for each call yields lower power consumption in both the mobile device and the base station. This translates to energy and cost savings for the mobile operator, while the end user will experience prolonged battery life and increased talk-time. Further, with less consumed bandwidth per user the mobile network can service a larger number of users in parallel.
A property of the human auditory system is that the perception is frequency dependent. In particular, our hearing is less accurate for higher frequencies. This has inspired so called bandwidth extension (BWE) techniques, where a high frequency band is reconstructed from a low frequency band using limited resources.
The conventional BWE uses a representation of the spectral envelope of the extended high band signal, and reproduces the spectral fine structure of the signal by using a modified version of the low band signal. If the high band envelope is represented by a filter, the fine structure signal is often called the excitation signal. An accurate representation of the high band envelope is perceptually more important than the fine structure. Consequently, it is common that the available resources in terms of bits are spent on the envelope representation while the fine structure is reconstructed from the coded low band signal without additional side information. The basic concept of BWE is illustrated in FIG. 1.
The technology of BWE has been applied in a variety of audio coding systems. For example, the 3GPP AMR-WB+, [1], uses a time domain BWE based on a low band coder which switches between Code Excited Linear Predictor (CELP) speech coding and Transform Coded Residual (TCX) coding. Another example is the 3GPP eAAC transform based audio codec which performs a transform domain variant of BWE called Spectral Band Replication (SBR), [2]. Here, the excitation is created using a mixture of tonal components generated from the low-band excitation and a noise source in order to match the tonal to noise ratio of the input signal. In general, the noisiness of the signal can be described as a measure of how flat the spectrum is, e.g. using a spectral flatness measure. The noisiness can also be described as non-tonality, randomness or non-structure of the excitation. Increasing the noisiness of a signal is to make it more noise-like by e.g. mixing the signal with a noise signal from e.g. a random number generator or any other noise source. It can also be done by modifying the spectrum of the signal to make it more flat.
The spectral fine structure from the low band may be very different from the fine structure found in the high band. In particular, the combination of an excitation generated from the low band signal together with the high band envelope may produce undesired artifacts as residing harmonicity or shape of the excitation may be emphasized by the envelope shaping in an uncontrolled way. As a safety measure, it is common to flatten the high band envelope in order to limit undesired interaction between the excitation and the envelope. Although this solution may give a reasonable trade-off, the flatter envelope may be perceived as more noisy and the high band envelope will be less accurate.