The present invention relates generally to the perceptual coding of digital audio signals that uses analysis filters for encoding and synthesis filters for decoding. The present invention relates more particularly to the quantization of subband signals in perceptual coders that takes into account the spreading of quantization noise by the synthesis filters.
There is a continuing interest to encode digital audio signals in a form that imposes low information capacity requirements on transmission channels and storage media yet can convey the encoded audio signals with a high level of subjective quality. Perceptual coding systems attempt to achieve these conflicting goals by using a process that encodes and quantizes the audio signals in a manner that uses larger spectral components within the audio signal to mask or render inaudible the resultant quantizing noise. Generally, it is advantageous to control the shape and amplitude of the quantizing noise spectrum so that it lies just below the psychoacoustic masking threshold of the signal to be encoded.
A perceptual encoding process may be performed by a so called split-band encoder that applies a bank of analysis filters to the audio signal to obtain subband signals having bandwidths that are commensurate with the critical bands of the human auditory system, estimates the masking threshold of the audio signal by applying a perceptual model to the subband signals or to some other measure of audio signal spectral content, establishes a quantization resolution for quantizing each subband signal that is just small enough so that the resultant quantizing noise lies just below the estimated masking threshold of the audio signal, and generates an encoded signal by assembling the quantized subband signals into a form suitable for transmission or storage. A complementary perceptual decoding process may be performed by a split-band decoder that extracts the quantized subband signals from the encoded signal, obtains dequantized representations of the quantized subband signals, and applies a bank of synthesis filters to the dequantized representations to generate an audio signal that is, ideally, perceptually indistinguishable from the original audio signal.
The perceptual models that are often used to determine the quantization resolution generally assume that the quantization noise introduced into the quantized subband signals is substantially the same as the noise that results in the output signal obtained by applying a bank of synthesis filters to the quantized subband signals. In general, this assumption is not true because the synthesis filters modify or spread the quantization noise spectrum. As a consequence, quantization performed strictly according to the quantization resolutions obtained by applying these perceptual models usually results in audible noise in the output signal obtained from the synthesis filters.
This noise-spreading phenomenon is true for a wide variety of implementations for the analysis and synthesis filters. These implementations include polyphase filters, lattice filters, the quadrature mirror filter, various time-domain-to-frequency-domain block transforms including a wide variety of Fourier-series type transforms, cosine-modulated filterbank transforms and wavelet transforms. For convenience, signal analysis and signal synthesis techniques that are suitable for use with the present invention are all referred to herein as the application of analysis filters and synthesis filters, respectively. In transform implementations, the subband signals each comprise a group of one or more frequency-domain transform coefficients.
The synthesis filter noise-spreading property mentioned above is related to the fact that the complementary analysis and synthesis filters used in these coding systems do not implement ideal filters having a flat unitary-gain in the passband, zero-gain in the stopbands, and infinitely steep transitions between the stopbands and the passband. As a consequence, the analysis filters provide only a distorted measure of the spectral content of an input audio signal. Furthermore, some filters such as the quadrature mirror filter (QMF) and the time-domain aliasing cancellation (TDAC) transforms generate significant aliasing artifacts that further distort the spectral measure of the input signal. In principle, these artifacts and deviations from perfect filters can be ignored because complementary pairs of analysis and synthesis filters can be used in which the synthesis filters are able to reverse the distortions of the analysis filter and perfectly reconstruct the original input signal.
Although perfect reconstruction is possible in principle, it is not achieved in practical coding systems because perfect reconstruction requires the synthesis filters to receive a precise representation of the subband signals generated by the analysis filters. Instead, the synthesis filters receive a representation with significant errors that are introduced by the quantization processes described above. As a result, subband signal quantization introduces errors that manifest themselves as noise in the signal that is reconstructed by the synthesis filters. As disclosed in U.S. Pat. No. 5,623,577, which is incorporated herein by reference in its entirety, the quantizing errors in a subband signal are spread by the synthesis filters into a range of frequencies that can be wider than the frequency subband of the quantized subband signal itself.
Unfortunately, perceptual encoding processes like those described above do not quantize the subband signals in an optimum manner because the quantization processes do not include a proper consideration for the noise-spreading process that occurs in the synthesis filters. Coding techniques disclosed in U.S. Pat. No. 5,301,255 do include some allowance for the aliasing that is generated by decimating the output of an analysis filter but these techniques do not provide any allowance for noise spreading in the synthesis filter. As a result, these processes overestimate the quantization resolutions that render the quantizing noise inaudible. This deficiency can be compensated to some degree by either forcing the level of the estimated masking threshold to be lower than an accurate perceptual model would indicate, or by uniformly decreasing the quantization resolution below that which an accurate perceptual model would indicate is sufficient to render the quantizing noise inaudible. Neither form of compensation is optimum because they do not properly account for the cause of the deficiency.
U.S. Pat. No. 5,623,577 discloses several techniques that compensate for the noise-spreading effect of synthesis filters. The theoretical basis of the disclosed techniques assumes the degree of noise spreading can be determined by convolving the quantization noise spectrum with the synthesis filter frequency response. Disclosed embodiments of the techniques determine whether compensation for synthesis filter noise spreading is required by comparing frequency-domain slopes of an estimated masking threshold with threshold values that are determined empirically. Unfortunately, these techniques are not optimum because the accuracy for determining whether compensation is needed is suboptimal, the steps required to obtain the needed empirical threshold values are expensive and time consuming, and the disclosed techniques do not take into consideration the effects of overlap-add processes that are included in some synthesis filters such as QMF and the TDAC transforms. In addition, the disclosed techniques do not provide an ability for a particular embodiment to gracefully tradeoff the accuracy of compensation against the computational resources required to carry out the embodiment.
It is an object of the present invention to improve the performance of perceptual coding systems and methods that use analysis and synthesis filters by providing a quantization process that accurately compensates for noise spreading in synthesis filters.
Advantageous embodiments of the present invention are able to determine the need for noise-spreading compensation in a manner that is more accurate than other known methods and to provide a graceful tradeoff between the accuracy of compensation and the level of computational resources required to provide the compensation.
According to one aspect of the present invention, a method or apparatus determines quantization resolutions for subband signals obtained from analysis filters applied to an input signal by generating a desired noise spectrum in response to the input signal and applying a synthesis-filter noise-spreading model to obtain estimated noise levels in subbands of an output signal obtained from synthesis filters. The synthesis-filter noise-spreading model represents noise-spreading characteristics of the synthesis filters and the quantization resolutions are determined such that a comparison of the desired-noise spectrum with the estimated noise levels satisfies one or more comparison criteria. The method may be embodied as a program of instructions on a medium that is readable by a device for execution by the device.
According to another aspect of the present invention, a medium conveys encoded information that comprises signal information that represents quantized components of subband signals generated by applying analysis filters to an input signal and control information that represents quantizing resolutions of the quantized subband signal components. The quantizing resolutions are determined as summarized above.
According to yet another aspect of the present invention, an apparatus receives and decodes a signal conveying the encoded information summarized above. The receiver comprises an input coupled to the signal conveying the encoded information; one or more processing circuits coupled to the input that extract the signal information and the control information from the encoded information and obtain therefrom the quantized subband signal components and the quantizing resolutions of the quantized subband signal components, dequantize the quantized subband signal components according to the quantizing resolutions to obtain dequantized subband signals, and apply synthesis filters to the dequantized subband signals to generate an output signal. The quantizing noise in the subband signals is spread by the synthesis filters to produce noise levels in subbands of the output signal that substantially satisfy the one or more comparison criteria with the desired-noise spectrum; and an output coupled to the one or more processing circuits that conveys the output signal.
The various features of the present invention and its preferred embodiments may be better understood by referring to the following discussion and the accompanying drawings in which like reference numerals refer to like elements in the several figures. The contents of the following discussion and the drawings are set forth as examples only and should not be understood to represent limitations upon the scope of the present invention.