Audio source coding techniques can be divided into two classes: natural audio coding and speech coding. Natural audio coding is commonly used for music or arbitrary signals at medium bitrates, and generally offers wide audio bandwidth. Speech coders are basically limited to speech reproduction but can on the other hand be used at very low bitrates, albeit with low audio bandwidth. In both classes, the signal is generally separated into two major signal components, the “spectral envelope” and the corresponding “residual” signal. Throughout the following description, the term “spectral envelope” refers to the coarse spectral distribution of the signal in a general sense, e.g. filter coefficients in an linear prediction based coder or a set of time-frequency averages of subband samples in a subband coder. The term “residual” refers to the fine spectral distribution in a general sense, e.g. the LPC error signal or subband samples normalized using the above time-frequency averages. “Envelope data” refers to the quantized and coded spectral envelope, and “residual data” to the quantized and coded residual. At medium and high bitrates, the residual data constitutes the main part of the bitstream. At very low bitrates, the envelope data constitutes a larger part of the bitstream. Hence, it is indeed important to represent the spectral envelope compactly when using lower bitrates.
Prior art audio coders and most speech coders use constant length, relatively short, time segments in the generation of envelope data to achieve good temporal resolution. However, this prevents optimal utilisation of the frequency domain masking known from psycho-acoustics. To improve coding gain through the use of narrow filterbands with steep slopes, and still achieve good temporal resolution during transient passages, modern audio coders employ adaptive window switching, i.e. they switch time segment lengths depending on the signals statistics. Clearly a minimum usage of the short segments is a prerequisite for maximum coding gain. Unfortunately, long transition windows are needed to alter the segment lengths, limiting the switching flexibility.
The spectral envelope is a function of two variables: time and frequency. The encoding can be done by exploiting redundancy in either direction of the time/frequency plane. Generally, coding of the spectral envelope is performed in the frequency direction, using delta coding (DPCM) or vector quantization (VQ).