Transform encoding is the main technology used to compress and transmit audio signals. The concept of transform encoding is to first convert a signal to the frequency domain, and then to quantize and transmit the transform coefficients. The decoder uses the received transform coefficients to reconstruct the signal waveform by applying the inverse frequency transform, see FIG. 1. In FIG. 1 an audio signal X(n) is forwarded to a frequency transformer 10. The resulting frequency transform Y(k) is forwarded to a transform encoder 12, and the encoded transform is transmitted to the decoder, where it is decoded by a transform decoder 14. The decoded transform Ŷ(k) is forwarded to an inverse frequency transformer 16 that transforms it into a decoded audio signal {circumflex over (X)}(n). The motivation behind this scheme is that frequency domain coefficients can be more efficiently quantized for the following reasons:                1) Transform coefficients (Y(k) in FIG. 1) are more uncorrelated than input signal samples (X(n) in FIG. 1).        2) The frequency transform provides energy compaction (more coefficients Y(k) are close to zero and can be neglected), and        3) The subjective motivation behind the transform is that the human auditory system operates on a transformed domain, and it is easier to select perceptually important signal components on that domain.        
In a typical transform codec the signal waveform is transformed on a block by block basis (with 50% overlap), using the Modified Discrete Cosine Transform (MDCT). In an MDCT type transform codec a block signal waveform X(n) is transformed into an MDCT vector Y(k). The length of the waveform blocks corresponds to 20-40 ms audio segments. If the length is denoted by 2L, the MDCT transform can be defined as:
                              Y          ⁡                      (            k            )                          =                                            2              L                                ⁢                                    ∑                              n                =                0                                                              2                  ⁢                  L                                -                1                                      ⁢                                          sin                ⁡                                  [                                                            (                                              n                        +                                                  1                          2                                                                    )                                        ⁢                                          π                      L                                                        ]                                            ⁢                              cos                ⁡                                  [                                                            (                                              n                        +                                                  1                          2                                                +                                                  1                          L                                                                    )                                        ⁢                                          (                                              k                        +                                                  1                          2                                                                    )                                        ⁢                                          π                      L                                                        ]                                            ⁢                              X                ⁡                                  (                  n                  )                                                                                        (        1        )            for k=0, . . . , L−1. Then the MDCT vector Y(k) is split into multiple bands (sub-vectors), and the energy (or gain) G(j) in each band is calculated as:
                              G          ⁡                      (            j            )                          =                                            1                              N                j                                      ⁢                                          ∑                                  k                  =                                      m                    j                                                                                        m                    j                                    +                                      N                    j                                    -                  1                                            ⁢                                                Y                  2                                ⁡                                  (                  k                  )                                                                                        (        2        )            where mj is the first coefficient in band j and N1 refers to the number of MDCT coefficients in the corresponding bands (a typical range contains 8-32 coefficients). As an example of a uniform band structure, let Nj=8 for all j, then G(0) would be the energy of the first 8 coefficients, G(1) would be the energy of the next 8 coefficients, etc.
These energy values or gains give an approximation of the spectrum envelope, which is quantized, and the quantization indices are transmitted to the decoder. Residual sub-vectors or shapes are obtained by scaling the MDCT sub-vectors with the corresponding envelope gains, e.g. the residual in each band is scaled to have unit Root Mean Square (RMS) energy. Then the residual sub-vectors or shapes are quantized with different number of bits based on the corresponding envelope gains. Finally, at the decoder, the MDCT vector is reconstructed by scaling up the residual sub-vectors or shapes with the corresponding envelope gains, and an inverse MDCT is used to reconstruct the time-domain audio frame.
The conventional transform encoding concept does not work well with very harmonic audio signals, e.g. single instruments. An example of such a harmonic spectrum is illustrated in FIG. 2 (for comparison a typical audio spectrum without excessive harmonics is shown FIG. 3). The reason is that the normalization with the spectrum envelope does not result in a sufficiently “flat” residual vector, and the residual encoding scheme cannot produce an audio signal of acceptable quality. This mismatch between the signal and the encoding model can be resolved only at very high bitrates, but in most cases this solution is not suitable.