This invention relates to compression and decompression of continuous signals, and more particularly to a method and system for reduction of quantization-induced block-discontinuities arising from lossy compression and decompression of continuous signals, especially audio signals.
A variety of audio compression techniques have been developed to transmit audio signals in constrained bandwidth channels and store such signals on media with limited storage capacity. For general purpose audio compression, no assumptions can be made about the source or characteristics of the sound. Thus, compression/decompression algorithms must be general enough to deal with the arbitrary nature of audio signals, which in turn poses a substantial constraint on viable approaches. In this document, the term xe2x80x9caudioxe2x80x9d refers to a signal that can be any sound in general, such as music of any type, speech, and a mixture of music and speech. General audio compression thus differs from speech coding in one significant aspect: in speech coding where the source is known a priori, model-based algorithms are practical.
Most approaches to audio compression can be broadly divided into two major categories: time and transform domain quantization. The characteristics of the transform domain are defined by the reversible transformations employed. When a transform such as the fast Fourier transform (FFT), discrete cosine transform (DCT), or modified discrete cosine transform (MDCT) is used, the transform domain is equivalent to the frequency domain. When transforms like wavelet transform (WT) or packet transform (PT) are used, the transform domain represents a mixture of time and frequency information.
Quantization is one of the most common and direct techniques to achieve data compression. There are two basic quantization types: scalar and vector. Scalar quantization encodes data points individually, while vector quantization groups input data into vectors, each of which is encoded as a whole. Vector quantization typically searches a codebook (a collection of vectors) for the closest match to an input vector, yielding an output index. A dequantizer simply performs a table lookup in an identical codebook to reconstruct the original vector. Other approaches that do not involve codebooks are known, such as closed form solutions.
A coder/decoder (xe2x80x9ccodecxe2x80x9d) that complies with the MPEG-Audio standard (ISO/IEC 11172-3; 1993(E)) (here, simply xe2x80x9cMPEGxe2x80x9d) is an example of an approach employing time-domain scalar quantization. In particular, MPEG employs scalar quantization of the time-domain signal in individual subbands, while bit allocation in the scalar quantizer is based on a psychoacoustic model, which is implemented separately in the frequency domain (dual-path approach).
It is well known that scalar quantization is not optimal with respect to rate/distortion tradeoffs. Scalar quantization cannot exploit correlations among adjacent data points and thus scalar quantization generally yields higher distortion levels for a given bit rate. To reduce distortion, more bits must be used. Thus, time-domain scalar quantization limits the degree of compression, resulting in higher bit-rates.
Vector quantization schemes usually can achieve far better compression ratios than scalar quantization at a given distortion level. However, the human auditory system is sensitive to the distortion associated with zeroing even a single time-domain sample. This phenomenon makes direct application of traditional vector quantization techniques on a time-domain audio signal an unattractive proposition, since vector quantization at the rate of 1 bit per sample or lower often leads to zeroing of some vector components (that is, time-domain samples).
These limitations of time-domain-based approaches may lead one to conclude that a frequency domain-based (or more generally, a transform domain-based) approach may be a better alternative in the context of vector quantization for audio compression. However, there is a significant difficulty that needs to be resolved in non-time-domain quantization based audio compression. The input signal is continuous, with no practical limits on the total time duration. It is thus necessary to encode the audio signal in a piecewise manner. Each piece is called an audio encode or decode block or frame. Performing quantization in the frequency domain on a per frame basis generally leads to discontinuities at the frame boundaries. Such discontinuities yield objectionable audible artifacts (xe2x80x9cclicksxe2x80x9d and xe2x80x9cpopsxe2x80x9d). One remedy to this discontinuity problem is to use overlapped frames, which results in proportionately lower compression ratios and higher computational complexity. A more popular approach is to use critically sampled subband filter banks, which employ a history buffer that maintains continuity at frame boundaries, but at a cost of latency in the codec-reconstructed audio signal. The long history buffer may also lead to inferior reconstructed transient response, resulting in audible artifacts. Another class of approaches enforces boundary conditions as constraints in audio encode and decode processes. The formal and rigorous mathematical treatments of the boundary condition constraint-based approaches generally involve intensive computation, which tends to be impractical for real-time applications.
The inventors have determined that it would be desirable to provide an audio compression technique suitable for real-time applications while having reduced computational complexity. The technique should provide low bit-rate full bandwidth compression (about 1-bit per sample) of music and speech, while being applicable to higher bit-rate audio compression. The present invention provides such a technique.
The invention includes a method and system for minimization of quantization-induced block-discontinuities arising from lossy compression and decompression of continuous signals, especially audio signals. In one embodiment, the invention includes a general purpose, ultra-low latency audio codec algorithm.
In one aspect, the invention includes: a method and apparatus for compression and decompression of audio signals using a novel boundary analysis and synthesis framework to substantially reduce quantization-induced frame or block-discontinuity; a novel adaptive cosine packet transform (ACPT) as the transform of choice to effectively capture the input audio characteristics; a signal-residue classifier to separate the strong signal clusters from the noise and weak signal components (collectively called residue); an adaptive sparse vector quantization (ASVQ) algorithm for signal components; a stochastic noise model for the residue; and an associated rate control algorithm. This invention also involves a general purpose framework that substantially reduces the quantization-induced block-discontinuity in lossy data compression involving any continuous data.
The ACPT algorithm dynamically adapts to the instantaneous changes in the audio signal from frame to frame, resulting in efficient signal modeling that leads to a high degree of data compression. Subsequently, a signal/residue classifier is employed to separate the strong signal clusters from the residue. The signal clusters are encoded as a special type of adaptive sparse vector quantization. The residue is modeled and encoded as bands of stochastic noise.
More particularly, in one aspect, the invention includes a zero-latency method for reducing quantization-induced block-discontinuities of continuous data formatted into a plurality of time-domain blocks having boundaries, including performing a first quantization of each block and generating first quantization indices indicative of such first quantization; determining a quantization error for each block; performing a second quantization of any quantization error arising near the boundaries of each block from such first quantization and generating second quantization indices indicative of such second quantization; and encoding the first and second quantization indices and formatting such encoded indices as an output bit-stream.
In another aspect, the invention includes a low-latency method for reducing quantization-induced block-discontinuities of continuous data formatted into a plurality of time-domain blocks having boundaries, including forming an overlapping time-domain block by prepending a small fraction of a previous time-domain block to a current time-domain block; performing a reversible transform on each overlapping time-domain block, so as to yield energy concentration in the transform domain; quantizing each reversibly transformed block and generating quantization indices indicative of such quantization; encoding the quantization indices for each quantized block as an encoded block, and outputting each encoded block as a bit-stream; decoding each encoded block into quantization indices; generating a quantized transform-domain block from the quantization indices; inversely transforming each quantized transform-domain block into an overlapping time-domain block; excluding data from regions near the boundary of each overlapping time-domain block and reconstructing an initial output data block from the remaining data of such overlapping time-domain block; interpolating boundary data between adjacent overlapping time-domain blocks; and prepending the interpolated boundary data with the initial output data block to generate a final output data block.
The invention also includes corresponding methods for decompressing a bitstream representing an input signal compressed in this manner, particularly audio data. The invention further includes corresponding computer program implementations of these and other algorithms.
Advantages of the invention include:
A novel block-discontinuity minimization framework that allows for flexible and dynamic signal or data modeling;
A general purpose and highly scalable audio compression technique;
High data compression ratio/lower bit-rate, characteristics well suited for applications like real-time or non-real-time audio transmission over the Internet with limited connection bandwidth;
Ultra-low to zero coding latency, ideal for interactive real-time applications;
Ultra-low bit-rate compression of certain types of audio;
Low computational complexity.
The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the description and drawings, and from the claims.