Many coding applications attempt to reduce the amount of information required to adequately represent a source signal. By reducing information capacity requirements, a signal representation can be transmitted over channels having lower bandwidth or stored on media using less space.
Coding can reduce the information capacity requirements of a source signal by eliminating either redundant components or irrelevant components in the signal. So called perceptual coding methods and systems often use filter banks to reduce redundancy by decorrelating a source signal using a basis set of spectral components, and reduce irrelevancy by adaptive quantization of the spectral components according to psycho-perceptual criteria. A coding process that adapts the quantizing resolution more coarsely can reduce information requirements to a greater extent but it also introduces higher levels of quantization error or “quantization noise” into the signal. Perceptual coding systems attempt to control the level of quantization noise so that the noise is “masked” or rendered imperceptible by other spectral content of the signal. These systems typically use perceptual models to predict the levels of quantization noise that can be masked by a given signal.
In perceptual audio coding systems, for example, quantization noise is often controlled by adapting quantizing resolutions according to predictions of audibility obtained from perceptual models based on psychoacoustic studies such as that described in E. Zwicker, Psychoacoustics, 1981. An example of a perceptual model that predicts the audibility of spectral components in a signal is discussed in M. Schroeder et al.; “Optimizing Digital Speech Coders by Exploiting Masking Properties of the Human Ear,” J. Acoust. Soc. Am., December 1979, pp. 1647-1652.
Spectral components that are deemed to be irrelevant because they are predicted to be imperceptible need not be included in the encoded signal. Other spectral components that are deemed to be relevant can be quantized using a quantizing resolution that is adapted to be fine enough to ensure the quantization noise is rendered just imperceptible by other spectral components in the source signal. Accurate predictions of perceptibility by a perceptual model allow a perceptual coding system to adapt the quantizing resolution more optimally, resulting in fewer audible artifacts.
A coding system using models known to provide inaccurate predictions of perceptibility cannot reliably ensure quantization noise is rendered imperceptible unless a finer quantizing resolution is used than would otherwise be required if a more accurate prediction was available. Many perceptual models such as that discussed by Schroeder, et al. are based on spectral component magnitude; therefore, accurate predictions by these models depend on accurate measures of spectral component magnitude.
Accurate measures of spectral component magnitude also influence the performance of other types of coding processes in addition to quantization. In two types of coding processes known as spectral regeneration and coupling, an encoder reduces information requirements of source signals by excluding selected spectral components from an encoded representation of the source signals and a decoder synthesizes substitutes for the missing spectral components. In spectral regeneration, the encoder generates a representation of a baseband portion of a source signal that excludes other portions of the spectrum. The decoder synthesizes the missing portions of the spectrum using the baseband portion and side information that conveys some measure of spectral level for the missing portions, and combines the two portions to obtain an imperfect replica of the original source signal. One example of an audio coding system that uses spectral regeneration is described in international patent application no. PCT/US03/08895 filed Mar. 21, 2003, publication no. WO 03/083034 WO 03/083834 published Oct. 9, 2003. In coupling, the encoder generates a composite representation of spectral components for multiple channels of source signals and the decoder synthesizes spectral components for multiple channels using the composite representation and side information that conveys some measure of spectral level for each source signal channel. One example of an audio coding system that uses coupling is described in the Advanced Television Systems Committee (ATSC) A/52A document entitled “Revision A to Digital Audio Compression (AC-3) Standard” published Aug. 20, 2001.
The performance of these coding systems can be improved if the decoder is able to synthesize spectral components that preserve the magnitudes of the corresponding spectral components in the original source signals. The performance of coupling also can be improved if accurate measures of phase are available so that distortions caused by coupling out-of-phase signals can be avoided or compensated.
Unfortunately, some coding systems use particular types of filter banks to derive an expression of spectral components that make it difficult to obtain accurate measures of spectral component magnitude or phase. Two common types of coding systems are referred to as subband coding and transform coding. Filter banks in both subband and transform coding systems may be implemented by a variety of signal processing techniques including various time-domain to frequency-domain transforms. See J. Tribolet et al., “Frequency Domain Coding of Speech,” IEEE Trans. Acoust., Speech, and Signal Proc., ASSP-27, October, 1979, pp. 512-530.
Some transforms such as the Discrete Fourier Transform (DFT) or its efficient implementation, the Fast Fourier Transform (FFT), provide a set of spectral components or transform coefficients from which spectral component magnitude and phase can be easily calculated. Spectral components of the DFT, for example, are multidimensional representations of a source signal. Specifically, the DFT, which may be used in audio coding and video coding applications, provides a set of complex-valued coefficients whose real and imaginary parts may be expressed as coordinates in a two-dimensional space. The magnitude of each spectral component provided by such a transform can be obtained easily from each component's coordinates in the multidimensional space using well known calculations.
Some transforms such as the Discrete Cosine Transform, however, provide spectral components that make it difficult to obtain an accurate measure of spectral component magnitude or phase. The spectral components of the DCT, for example, represent the spectral component of a source signal in only a subspace of the multidimensional space required to accurately convey spectral magnitude and phase. In typical audio coding and video coding applications, for example, a DCT provides a set of real-valued spectral components or transform coefficients that are expressed in a one dimensional subspace of the two-dimensional real/imaginary space mentioned above. The magnitude of each spectral component provided by transforms like the DCT cannot be obtained easily from each component's coordinates in the relevant subspace.
This characteristic of the DCT is shared by a particular Modified Discrete Cosine Transform (MDCT), which is described in J. Princen et al., “Subband/Transform Coding Using Filter Bank Designs Based on Time Domain Aliasing Cancellation,” ICASSP 1987 Conf. Proc., May 1987, pp. 2161-64. The MDCT and its complementary Inverse Modified Discrete Cosine Transform (IMDCT) have gained widespread usage in many coding systems because they permit implementation of a critically sampled analysis/synthesis filter bank system that provides for perfect reconstruction of overlapping segments of a source signal. Perfect reconstruction refers to the property of an analysis/synthesis filter bank pair to reconstruct perfectly a source signal in the absence of errors caused by finite precision arithmetic. Critical sampling refers to the property of an analysis filter bank to generate a number of spectral components that is no greater than the number of samples used to convey the source signal. These properties are very attractive in many coding applications because critical sampling reduces the number of spectral components that must be encoded and conveyed in an encoded signal.
The concept of critical sampling deserves some comment. Although the DFT or the DCT, for example, generate one spectral component for each sample in a source signal segment, DFT and DCT analysis/synthesis systems in many coding applications do not provide critical sampling because the analysis transform is applied to a sequence of overlapping signal segments. The overlap allows use of non-rectangular shaped window functions that improve analysis filter bank frequency response characteristics and eliminate blocking artifacts; however, the overlap also prevents perfect reconstruction with critical sampling because the analysis filter bank must generate more coefficient values than the number of source signal samples. This loss of critical sampling increases the information requirements of the encoded signal.
As mentioned above, filter banks implemented by the MDCT and IMDCT are attractive in many coding systems because they provide perfect reconstruction of overlapping segments of a source signal with critically sampling. Unfortunately, these filter banks are similar to the DCT in that the spectral components of the MDCT represent the spectral component of a source signal in only a subspace of the multidimensional space required to accurately convey spectral magnitude and phase. Accurate measures of spectral magnitude or phase cannot be obtained easily from the spectral components or transform coefficients generated by the MDCT; therefore, the coding performance of many systems that use the MDCT filter bank is suboptimal because the prediction accuracy of perceptual models is degraded and the preservation of spectral component magnitudes by synthesizing processes is impaired.
Prior attempts to avoid this deficiency of various filter banks like the MDCT and DCT filter banks have not been satisfactory for a variety of reasons. One technique is disclosed in “ISO/IEC 11172-3: 1993 (E) Coding of Moving Pictures and Associated Audio for Digital Storage Media at Up to About 1.5 Mbit/s,” ISO/IEC JTC1/SC29/WG11, Part III Audio. According to this technique, a set of filter banks including several MDCT-based filter banks is used to generate spectral components for encoding and an additional FFT-based filter bank is used to derive accurate measures of spectral component magnitude. This technique is not attractive for at least two reasons: (1) considerable computational resources are required in the encoder to implement the additional FFT filter bank needed to derive the measures of magnitude, and (2) the processing to obtain accurate measures of magnitude are performed in the encoder; therefore additional bandwidth is required by the encoded signal to convey these measures of spectral component magnitude to the decoder.
Another technique avoids incurring any additional bandwidth required to convey measures of spectral component magnitude by calculating these measures in the decoder. This is done by applying a synthesis filter bank to the decoded spectral components to recover a replica of the source signal, applying an analysis filter bank to the recovered signal to obtain a second set of spectral components in quadrature with the decoded spectral components, and calculating spectral component magnitude from the two sets of spectral components. This technique also is not attractive because considerable computational resources are required in the decoder to implement the analysis filter bank needed to obtain the second set of spectral components.
Yet another technique, described in S. Merdjani et al., “Direct Estimation of Frequency From MCT-Encoded Files,” Proc. of the 6th Int. Conf. on Digital Audio Effects (DAFx-03), London, September 2003, estimates the frequency, magnitude and phase of a sinusoidal source signal from a “regularized spectrum” derived from MDCT coefficients. This technique overcomes the disadvantages mentioned above but it also is not satisfactory for typical coding applications because it is applicable only for a very simple source signal that has only one sinusoid.
Another technique, which is disclosed in U.S. patent application Ser. No. 09/948,053, publication number U.S. 2003/0093282 A1 published May 15, 2003, is able to derive DFT coefficients from MDCT coefficients; however, the disclosed technique does not obtain measures of magnitude or phase for spectral components represented by the MDCT coefficients themselves. Furthermore, the disclosed technique does not use measures of magnitude or phase to adapt processes for encoding or decoding information that represents the MDCT coefficients.
What is needed is a technique that provides accurate estimates of magnitude or phase from spectral components generated by analysis filter banks such as the MDCT that also avoids or overcomes deficiencies of known techniques.