1. Technical Field
An “Overcomplete Audio Coder” provides various techniques for encoding audio signals using modulated complex lapped transforms (MCLT), and in particular, to various techniques for implementing a predictive MCLT-based coder that significantly reduces the rate overhead caused by the overcomplete sampling nature of the MCLT, without the need for iterative algorithms for sparsity reduction.
2. Related Art
Most modern audio compression systems use a frequency-domain approach. The main reason is that when short audio blocks (say, 20 ms) are mapped to the frequency domain, for most blocks a large fraction of the signal energy is concentrated in relatively few frequency components, a necessary first step to achieve good compression. The mapping from time to frequency domain is usually performed by the modulated lapped transform (MLT), also known as the modified discrete cosine transform (MDCT). In general, the MLT is an overlapping orthogonal transform that allows for smooth signal reconstruction even after heavy quantization of the transform coefficients, without discontinuities across block boundaries (blocking artifacts).
One disadvantage of the MLT is that it does not provide a shift-invariant representation of the input signal. In particular, if the input signal is shifted by a small amount (e.g., ⅛th of a block), the resulting MLT transform coefficients will change significantly. In fact, just like with wavelet decompositions, there are no overlapping transforms or filter banks that can be both shift invariant and orthogonal.
For example, in the case where an audio signal is composed of a single sinusoid of constant frequency and amplitude, the MLT coefficients will vary from block to block. Therefore, if they are quantized, the reconstructed audio will be a modulated sinusoid. Unfortunately, when all harmonic components of a more complex audio signal (such as speech or music, for example) suffer from these modulations, “warbling” artifacts can be heard in the reconstructed signal.
These types of modulation artifacts can be significantly reduced if the MLT is replaced by a transform that supports a magnitude-phase representation, such as the modulated complex lapped transform (MCLT). However, the MCLT is an overcomplete (or oversampled) transform by a factor of two. In particular, the MCLT maps a block with M new real-valued signal samples into M complex-valued transform coefficients (with a real and an imaginary component for each signal sample, thereby oversampling by a factor of two). Unfortunately, while conventional MCLT-based coders can significantly reduce modulation artifacts, the inherent oversampling of such schemes significantly reduces compression performance of conventional MCLT-based coders.