Modern telecommunication services are expected to handle many different types of audio signals. While the main audio content is speech signals, there is a desire to handle more general signals such as music and mixtures of music and speech. Although the capacity in telecommunication networks is continuously increasing, it is still of great interest to limit the required bandwidth per communication channel. In mobile networks smaller trans-mission bandwidths for each call yields lower power consumption in both the mobile device and the base station. This translates to energy and cost saving for the mobile operator, while the end user will experience prolonged battery life and increased talk-time. Further, with less consumed bandwidth per user the mobile network can service a larger number of users in parallel.
Today, the dominating compression technology for mobile voice services is CELP (Code Excited Linear Prediction), which achieves good audio quality for speech at low bandwidths. It is widely used in deployed codecs such as AMR (Adaptive MultiRate), AMR-WB (Adaptive MultiRate WideBand) and GSM-EFR (Global System for Mobile communications-Enhanced FullRate). However, for general audio signals such as music the CELP technology has poor performance. These signals can often be better represented by using frequency transform based coding, for example the ITU-T codecs G.722.1 [1] and G.719[2]. However, transform domain codecs generally operate at a higher bitrate than the speech codecs. There is a gap between the speech and general audio domains in terms of coding and it is desirable to increase the performance of transform domain codecs at lower bitrates.
Transform domain codecs require a compact representation of the frequency domain transform coefficients. These representations often rely on vector quantization (VQ), where the coefficients are encoded in groups. Among the various methods for vector quantization is the gain-shape VQ. This approach applies normalization to the vectors before encoding the individual coefficients. The normalization factor and the normalized coefficients are referred to as the gain and the shape of the vector, which may be encoded separately. The gain-shape structure has many benefits. By dividing the gain and the shape the codec can easily be adapted to varying source input levels by designing the gain quantizer. It is also beneficial from a perceptual perspective where the gain and shape may carry different importance in different frequency regions. Finally, the gain-shape division simplifies the quantizer design and makes it less complex in terms of memory and computational resources compared to an unconstrained vector quantizer. A functional overview of a gain-shape quantizer can be seen in FIG. 1.
If applied to a frequency domain spectrum, the gain-shape structure can be used to form a spectral envelope and fine structure representation. The sequence of gain values forms the envelope of the spectrum while the shape vectors give the spectral detail. From a perceptual perspective it is beneficial to partition the spectrum using a non-uniform band structure which follows the frequency resolution of the human auditory system. This generally means that narrow bandwidths are used for low frequencies while larger bandwidths are used for high frequencies. The perceptual importance of the spectral fine structure varies with the frequency, but is also dependent on the characteristics of the signal itself. Transform coders often employ an auditory model to determine the important parts of the fine structure and assign the available resources to the most important parts. The spectral envelope is often used as input to the auditory model. The shape encoder quantizes the shape vectors using the assigned bits. See FIG. 2 for an example of a transform based coding system with an auditory model.
Depending on the accuracy of the shape quantizer, the gain value used to reconstruct the vector may be more or less appropriate. Especially when the allocated bits are few, the gain value drifts away from the optimal value. One way to solve this is to encode a correcting factor which accounts for the gain mismatch after the shape quantization. Another solution is to encode the shape first and then compute the optimal gain factor given the quantized shape.
The solution to encode a gain correction factor after shape quantization may consume considerable bitrate. If the rate is already low, this means more bits have to be taken elsewhere and may perhaps reduce the available bitrate for the fine structure.
To encode the shape before encoding the gain is a better solution, but if the bitrate for the shape quantizer is decided from the quantized gain value, then the gain and shape quantization would depend on each other. An iterative solution could likely solve this co-dependency but it could easily become too complex to be run in real-time on a mobile device.