The present invention relates to audio signal compression, and more particicularly to techniques for compressing an audio signal in a manner that will deliver a stable and high quality audio signal at lower bit rates than would otherwise be possible.
The invention is particularly effective in conjunction with the audio compression technique of Adaptive Predictive Coding with Transform Domain Quantization (APC-TQ), e.g., as described in U.S. Pat. No. 5,206,884 incorporated by reference herein, although it is not limited to use with such a compression technique.
Most audio coders process the audio signal in blocks of a fixed size. It is approximated that the second order statistics (i.e., the autocorrelation function and power spectrum) do not change over the duration of the block. This property is referred to as second order quasistationarity, or simply stationarity in the following discussion. In reality, audio signals exhibit highly diverse durations of stationarity. The signal can be stationary over long intervals, on the order of several hundreds of milliseconds, but may show rapid changes in characteristics over small intervals on the order of tens of milliseconds. During stationary intervals, it is advantageous to maximize the block size (the number of samples per block). This permits (i) a frequency domain analysis with higher spectral resolution and/or (ii) improves the efficiency of transmission of spectral modeling parameters, since the longer stationary period is modeled by a single parameter set. On the other hand, when the signal is non-stationary, it is advantageous to minimize the block size, so that the changes in signal characteristics are tracked adequately. Thus, a single fixed block size cannot adequately fulfill these conflicting requirements.
For audio signals, which often display large spectral dynamic range corresponding to highly resonant sounds, the magnitudes of linear predictive coding (LPC) coefficients can be large. This property is further accentuated by large order spectral models. It is desirable to reduce the magnitudes of the LPC parameters without substantially reducing the spectral modeling accuracy. This is important since the large valued LPC parameters result in correspondingly large amplification of the reconstruction noise of the previous block stored in the delay lines of the synthesis filters. The existing method of reducing these values may not be acceptable for audio signals, since the spectral modeling accuracy of low level high frequency components is sacrificed to achieve lower power gain.
Audio compression techniques based on transform domain representations use a non-uniform allocation of the bits available for transform coefficient quantization for each block. In early transform coders, this bit-allocation was performed based on an objective criterion, so as to minimize a weighted mean squared reconstruction noise power (e.g., as described by N. S. Jayant etal, Digital Coding of Waveforms, Prentice-Hall, Englewood Cliffs, N.J., 1984). More recent audio coders, such as the perceptual transform coders, allocate the available bits among the transform coefficients based on perceptual criteria, in which the objective is to maintain the reconstruction noise power spectrum below the auditory noise masking threshold, computed using models of the human auditory system (e.g., as described by J. D. Johnston, "Transform Coding of Audio Signals Using Perceptual Criteria," IEEE Journal on Selected Areas in Communications, Vol. 6, pp. 314-323, February 1988).
However, at low coding rates (as in the case of the APC-TQ codec operating at 17 kbit/s for 5 kHz bandwidth), significantly fewer bits (i.e., less than 1.5 bit/transform coefficient) are available for the quantization of transform coefficients, as opposed to other current transform domain audio coders (about 3 bits/transform coefficient). The coarser quantization, combined with the prediction and synthesis filtering used in the APC-TQ, causes bit-allocation based entirely on perceptual criteria to result occasionally in unstable codec performance. The probable cause is that the level of quantization noise allowed at a frequency corresponding to a synthesis filter pole very close to the unit circle was occasionally large enough to drive the synthesis filter unstable if sustained over a few consecutive blocks.
Bit-allocation based purely on objective criteria did not have this problem, since the mean squared reconstruction noise is explicitly minimized. However, aside from this advantage, the performance of the objective bit-allocation was clearly inferior to that of the perceptual bit-allocation during stable blocks.
An earlier version of the APC-TQ codec assumed that the reconstruction noise of the previous block is zero, so that the ringing of the reconstruction noise of the previous block into the current block can be ignored. However, this simplification becomes unacceptable at lower bit rates, and with perceptual techniques, due to higher levels of reconstruction noise.