Modern telecommunication services are expected to handle many different types of audio signals. While the main audio content is speech signals, there is a desire to handle more general signals such as music and mixtures of music and speech. Although the capacity in telecommunication networks is continuously increasing, it is still of great interest to limit the required bandwidth per communication channel. In mobile networks smaller transmission bandwidths for each call yields lower power consumption in both the mobile device and the base station. This translates to energy and cost saving for the mobile operator while the end user will experience prolonged battery life and increased talk-time. Further, with less consumed bandwidth per user the mobile network can service a larger number of users in parallel.
Today, the dominating compression technology for mobile voice services is Code Excited Linear Prediction (CELP), which achieves good audio quality for speech quality at low bandwidths. It is widely used in deployed codecs such as GSM Enhanced Full Rate (GSM-EFR), Adaptive Multi Rate (AMR) and AMR-Wideband (AMR-WB). However, for general audio signals such as music the CELP technology has poor performance. These signals can often be better represented by using frequency transform based coding, for example the ITU-T codecs G.722.1 and G.719. However, transform domain codecs generally operate at a higher bitrate than the speech codecs. There is a gap between the speech and general audio domains in terms of coding and it is desirable to increase the performance of transform domain codecs at lower bitrates.
Transform domain codecs require a compact representation of the frequency domain transform coefficients. These representations often rely on vector quantization (VQ), where the coefficients are encoded in groups. An example of vector quantization is gain-shape VQ. This approach applies normalization to the vectors before encoding the individual coefficients. The normalization factor and the normalized coefficients are referred to as the gain and the shape of the vector, which may be encoded separately. The gain-shape structure has many benefits. By dividing the gain and the shape, the codec can easily be adapted to varying source input levels by designing the gain quantizer. It is also beneficial from a perceptual perspective where the gain and shape may carry different importance in different frequency regions. Finally, the gain-shape division simplifies the quantizer design and makes is less complex in terms of memory and computational resources compared to an unconstrained vector quantizer. A functional overview of a gain-shape quantizer for one vector according to prior art can be seen in FIG. 1, which illustrates an encoder 40 and a decoder 50 side. In FIG. 1, an arbitrary input data vector x 100 of length L is fed to a gain-shape quantization scheme. Here, the gain factor is defined as the Euclidean norm (2-norm) of the vector, which implies that the terms gain and norm are used interchangeably throughout this document. First, a norm g is calculated by a norm calculator 110 which represents the overall size of the vector. Commonly, the Euclidean norm is used
                    g        =                                            ∑                              i                =                1                            L                        ⁢                          x              i              2                                                          (        1        )            
The norm is then quantized by a norm quantizer 120 to form ĝ and a quantization index IN representing the quantized norm. The input vector is scaled using 1/ĝ to form a normalized shape vector n, which in turn is fed to the shape quantizer 130. The quantizer index IS from the shape quantizer 130 and the norm quantizer 120 are multiplexed by a bitstream multiplexer 140 to be stored or transmitted to a decoder 50. The decoder 50 retrieves the indices IN and IS from the demultiplexed bitsteam and forms a reconstructed vector {circumflex over (x)} 190 by retrieving the quantized shape vector {circumflex over (n)} from the shape decoder 150 and the quantized norm from the norm decoder 160 and scaling the quantized shape with ĝ 180.
The gain-shape quantizer generally operates on vectors of limited length, but they can be used to handle longer sequences by first partitioning the signal into shorter vectors and applying the gain-shape quantizers to each vector. This structure is often used in transform based audio codecs. FIG. 2 exemplifies a transform based coding system for gain and shape quantization for a sequence of vectors according to prior art. It should be noted that FIG. 1 illustrates a gain-shape quantizer for one vector while the gain-shape quantization in FIG. 2 is applied parallel on a sequence of vectors, wherein the vectors together constitute a frequency spectrum. The sequence of the gain (norm) values constitute the spectral envelope. The input audio 200 is first partitioned into time segments or frames as a preparation for the frequency transform 210. Each frame is transformed to the frequency domain to form a frequency domain spectrum X. This may be done using any suitable transform, such as MDCT, DCT or DFT. The choice of transform may depend on the characteristics of the input signal, such that important properties are well modeled with that transform. It may also include considerations for other processing steps if the transform is reused for other processing steps, such as stereo processing. The frequency spectrum is partitioned into shorter row vectors denoted X(b). Each vector now represents the coefficients of a frequency band b. From a perceptual perspective it is beneficial to partition the spectrum using a non-uniform band structure which follows to the frequency resolution of the human auditory system. This generally means that narrow bandwidths are used for low frequencies while larger bandwidths are used for high frequencies.
Next, the norm of each band is calculated 230 as in equation (1) to form a sequence of gain values E(b) which form the spectral envelope. These values are then quantized using the envelope quantizer 240 to form the quantized envelope Ê(b). The envelope quantization 240 may be done using any quantizing technique, e.g. differential scalar quantization or any vector quantization scheme. The quantized envelope coefficients Ê(b) are used to normalize 250 the band vectors X(b) to form the corresponding normalized shape vectors N(b).
                              N          ⁡                      (            b            )                          =                              1                                          E                ^                            ⁡                              (                b                )                                              ⁢                      X            ⁡                          (              b              )                                                          (        2        )            
Note that if the envelope quantization is accurate, i.e. Ê(b)≈E(b), the norm of the normalized shape vectors will be 1. This relates to a pre-normalization that may be done in the decoder.Ê(b)=E(b) √{square root over (N(b)·N(b)T)}=1
The sequence of normalized shape vectors constitutes the fine structure of the spectrum. The perceptual importance of the spectral fine structure varies with the frequency but may also depend on other signal properties such as the spectral envelope signal. Transform coders often employ an auditory model to determine the important parts of the fine structure and assign the available resources to the most important parts. The spectral envelope is often used as input to this auditory model and the output is typically a bit assignment for the each of the bands corresponding to the envelope coefficients. Here, a bit allocation algorithm 270 uses a quantized envelope Ê(b) in combination with an internal auditory model to assign a number of bits R(b) which in turn are used by the fine structure quantizer 260. The indices from the envelope quantization IE and the fine structure quantization IF are multiplexed by a bitstream multiplexer 280 to be stored or transmitted to a decoder.
The decoder demultiplexes in bitstream demultiplexer 285 the indices from the communication channel or the stored media and forwards the indices IF to the fine structure dequantizer 265 and the indices IE to the envelope dequantizer 245. The quantized envelope Ê(b) is obtained from an envelope de-quantizer 245 and fed to a bit allocation entity 275 in the decoder, which generates the bit allocation R(b). The fine structure dequantizer 265 uses the fine structure indices and the bit allocation to produce the quantized fine structure vectors {circumflex over (N)}(b). A synthesized frequency spectrum {circumflex over (X)}(b) is obtained by scaling in an envelope shaping entity 235 the quantized fine structure with the quantized envelope{circumflex over (X)}(b)=Ê(b)·{circumflex over (N)}(b)  (3)
The inverse transform 215 is applied to the synthesized frequency spectrum {circumflex over (X)}(b) to obtain the synthesized output signal 290.
The performance of the gain-shape VQ for different bit rates depends on how the gain and shape quantizers interact. In particular, some shape quantizers are capable of compensating small energy deviations which may reside from the gain quantization. Other shape quantizers can be said to be pure shape quantizers, which cannot represent any gain information and cannot compensate the gain quantizer error at all. For the pure shape quantizer, the gain-shape system becomes sensitive to the bit sharing between gain and shape. One possible solution is to assign an additional gain adjustment factor after the shape quantization to adjust the gain based on the synthesized shape, as shown in FIG. 3. FIG. 3 shows a transform based coding system as illustrated in FIG. 2 with the addition of the gain adjustment analyzer 301, to assign a respective additional gain adjustment factor G(b). This is found by comparing the quantized fine structure {circumflex over (N)}(b) with the fine structure N(b)
      G    ⁡          (      b      )        =                                          N            ^                    ⁡                      (            b            )                          T            ⁢              N        ⁡                  (          b          )                                              N          ⁡                      (            b            )                          T            ⁢              N        ⁡                  (          b          )                    
The gain adjustment factor G(b) is quantized to produce an index IG which is multiplexed together with the fine structure indices IF and envelope indices IE to be stored or transmitted to a decoder.
Recall that a perfect envelope quantization would give √{square root over (N(b)·N(b)T)}=1. By pre-adjusting the gain of the quantized fine structure, the gain adjustment factor may also handle quantization errors from the envelope quantization. This can be done using equation (1) to obtain a pre-adjustment gain factor gn 
      g    n    =      1                                        N            ^                    ⁡                      (            b            )                          ·                                            N              ^                        ⁡                          (              b              )                                T                    
which gives that√{square root over (gn{circumflex over (N)}(b)·gn{circumflex over (N)}(b)T)}=1
Now if {circumflex over (N)}(b) is substituted with {circumflex over (N)}′(b)=gn{circumflex over (N)}(b) in the gain adjustment calculation such that
      G    ⁡          (      b      )        =                                                        N              ^                        ′                    ⁡                      (            b            )                          T            ⁢              N        ⁡                  (          b          )                                              N          ⁡                      (            b            )                          T            ⁢              N        ⁡                  (          b          )                    
then the gain adjustment factor G(b) may also compensate for errors in the envelope quantization. This method is considered prior-art and hereafter it is assumed that a pre-adjustment to have √{square root over ({circumflex over (N)}(b)·{circumflex over (N)}(b)T)}=1 is an integral part of the shape dequantizer.
The decoder of FIG. 3 is similar to the decoder of FIG. 2, but with the addition of a gain adjustment unit 302 which uses the gain adjustment index IG to reconstruct a quantized gain adjustment factor Ĝ(b). This is in turn used to create a gain adjusted fine structure Ñ(b).Ñ(b)=Ĝ(b)·{circumflex over (N)}(b)
As in FIG. 2, a synthesized frequency spectrum {circumflex over (X)}(b) is obtained by scaling the gain adjusted fine structure with the envelope{tilde over (X)}(b)=Ê(b)·Ñ(b)
The inverse transform is applied to the synthesized frequency spectrum {circumflex over (X)}(b) to obtain the synthesized output signal.
However, at low bitrates the gain adjustment may consume too many bits which reduces the performance of the shape quantizer and gives poor overall performance.