The present invention pertains generally to audio and image coding systems and methods, and pertains more particularly to lossless compression techniques that can be used in audio and image coding systems to provide high levels of compression at low computational cost without requiring high-accuracy pre-defined probability distribution functions of the information to be compressed.
There is considerable interest among those in the fields of audio and image signal processing to reduce the amount of information required to represent audio and image signals without perceptible loss in signal quality. By reducing the amount of information required to represent such signals, the representations impose lower information capacity requirements upon communication paths and storage media. Of course, there are limits to the amount of reduction that can be realized without degrading the perceived signal quality.
Information capacity requirements can be reduced by applying either or both of two types of data compression techniques. One type, sometimes referred to as xe2x80x9clossyxe2x80x9d compression, reduces information capacity requirements in a manner which does not assure, and generally prevents, perfect recovery of the original signal. Another type, sometimes referred to as xe2x80x9closslessxe2x80x9d compression, reduces information capacity requirements in a manner that permits perfect recovery of the original signal.
Quantization is one well known digital lossy compression technique. Quantization can reduce information capacity requirements by reducing the number of bits used to represent each sample of a digital signal, thereby reducing the accuracy of the digital signal representation. The reduced accuracy or quantizing error is manifested as noise, therefore, quantization may be thought of as a process that injects noise into a signal. If the quantization errors are of sufficient magnitude, the quantizing noise will be perceptible and degrade the subjective quality of the coded signal.
Perceptual coding systems attempt to apply lossy compression techniques to an input signal without suffering any perceptible degradation by removing components of information that are imperceptible or irrelevant to perceived signal quality. A complementary decoding system can recover a replica of the input signal that is perceptually indistinguishable from the input signal provided the removed components are truly irrelevant.
So called split-band coding techniques are often used in perceptual coding systems because they can facilitate the analysis of an input signal to identify its irrelevant parts. A split-band encoder splits an input signal into several narrow-band signals, analyzes the narrow-band signals to identify those parts deemed to be irrelevant, and adaptively quantizes each narrow-band signal in a manner that removes these parts.
Split-band audio encoding often comprises the use of a forward or analysis filterbank to divide an audio signal into several subband signals each having a bandwidth commensurate with the so called critical bandwidths of the human auditory system. Each subband signal is quantized using just enough bits to ensure that the quantizing noise in each subband is masked by the spectral component in that subband and adjacent subbands. Split-band audio decoding comprises reconstructing a replica of the original signal using an inverse or synthesis filterbank. If the bandwidths of the filters in the filter banks and the quantization accuracy of the subband signals are chosen properly, the reconstructed replica can be perceptually indistinguishable from the original signal.
Two such coding techniques are subband coding and transform coding. Subband coding may use various analog and/or digital filtering techniques to implement the filterbanks. Transform coding uses various time-domain to frequency-domain transforms to implement the filterbanks. Adjacent frequency-domain transform coefficients may be grouped to define xe2x80x9csubbandsxe2x80x9d having effective bandwidths that are sums of individual transform coefficient bandwidths.
Throughout the following discussion, the term xe2x80x9csplit-band codingxe2x80x9d and the like refers to subband encoding and decoding, transform encoding and decoding, and other encoding and decoding techniques that operate upon portions of the useful signal bandwidth. The term xe2x80x9csubbandxe2x80x9d refers to these portions of the useful signal bandwidth, whether implemented by a true subband coder, a transform coder, or other technique. The term xe2x80x9csubband signalxe2x80x9d refers to a split-band filtered signal within a respective subband.
Another lossy compression technique is called scaling. Many coding techniques including split-band coding convey signals using a scaled representation to extend the dynamic range of encoded information represented by a limited number of bits. A scaled representation comprises one or more xe2x80x9cscaling factorsxe2x80x9d associated with xe2x80x9cscaled valuesxe2x80x9d corresponding to elements of the encoded signals. Many forms of scaled representation are known. By sacrificing some accuracy in the scaled values, even fewer bits may be used to convey information using a xe2x80x9cblock-scaled representation.xe2x80x9d A block-scaled representation comprises a group or block of scaled values associated with a common scaling factor.
Lossless compression techniques reduce information capacity requirements of a signal without degradation by reducing or eliminating components of the signal that are redundant. A complementary decompression technique can recover the original signal perfectly by providing the redundant component removed during compression. Examples of lossless compression techniques include run-length encoding, adaptive and nonadaptive forms of differential coding, linear predictive coding, transform coding, and forms of so called entropy coding such as Huffman coding. Variations, combinations and adaptive forms of these compression techniques are also known.
Generally, the best levels of compression are achieved by hybrid techniques that combine lossless and lossy compression techniques. Two types of hybrid techniques are discussed below.
An example of the first hybrid type combines lossless transform coding with lossy vector quantization to quantize transform coefficients. Vector quantization uses a codebook of quantized values in an N-dimensional vector space and quantizes each source vector to the value that is associated with the closest codebook vector. Computational complexity for the process needed to find the closest vector increases geometrically as the dimension of the codebook vector space increases. In principle, vector quantization provides optimum encoding according to a rate-distortion theory, as discussed in Gersho and Gray, xe2x80x9cVector Quantization and Signal Compression,xe2x80x9d Prentice-Hall, 1992; however, optimum performance is achieved only asymptotically as the dimension of the vector space approaches infinity. As a result, near-optimum coding performance can be achieved only in exchange for incurring much higher computational costs. Alternative quantization methods such as transform weighted interleaved vector quantization and pyramid vector quantization, described in Iwakami et al., xe2x80x9cHigh Quality Audio Coding at Less than 64 kb/s by using Transform-Domain Weighted Interleaved Vector Quantization (TWIN-VQ),xe2x80x9d IEEE Proc. of ICASSP, 1995, pp. 3095-98, and Cadel et al., xe2x80x9cPyramid Vector Coding for High Quality Audio Compression,xe2x80x9d IEEE Proc. of ICASSP, 1996, may be used to reduce computational complexity. Unfortunately, even the computational cost of these methods is very high.
An example of the second hybrid type combines lossless transform coding with lossy uniform quantization of the transform coefficients and a subsequent lossless encoding of the quantized coefficients using, for example, Huffman encoding. The Huffman encoding technique uses a codebook that is based on a pre-determined probability distribution function (PDF) of input values, and that associates shorter-length codes to the more frequently occurring values. Both scalar-Huffman encoding and multi-dimensional vector-Huffman encoding are possible. This particular example of the second hybrid type can work reasonably well provided the assumed PDF of input values is reasonably close to the actual distribution of values to be encoded. It is well known, however, that Huffman encoding can actually increase information capacity requirements if the assumed PDF is a poor model of the actual value distribution.
Another lossless encoding technique is discussed in International Patent Application Publication No. WO 99/62253 entitled xe2x80x9cScalable Audio Coder and Decoder.xe2x80x9d This technique, referred to as Tunstall encoding, is a dual of Huffman encoding in that it uses fixed-length code words to represent variable-length strings of input values. This technique can use a parametric PDF model and can, therefore, select a model from a set of models representing diverse probability statistics. Although many different PDF can be created using the parametric model, like Huffman encoding, the performance of this technique depends on the accuracy of the PDF.
Yet another lossless encoding technique known as Bit-Sliced Arithmetic Coding (BSAC) is discussed in the MPEG4 standards document ISO/IEC WD 14496-3:1997 v4.0 (E) w1745tf xe2x80x9cT/F Core Description,xe2x80x9d Section 2.12, pp 60-63. This technique, which is similar to the compression technique used in the JPEG image coding standard, first aligns quantized transform coefficients that are represented in binary, concatenates bits in each of the coefficients that have the same significance to form vectors of bits, and then arithmetically encodes the resulting vectors. For example, one vector is formed from a concatenation of the least significant bit (LSB) of each coefficient, another vector is formed from the next LSB of each coefficient, and so on. Unfortunately, this technique does not perform very well in perceptual coding systems because it assumes that each coefficient has been quantized with the same number of bits. When the number of significant bits for various coefficients changes across or within a band, for example, the more significant bits in some of the coefficients, which are merely sign extension bits, are needlessly coded.
It is an object of the present invention to provide encoding and decoding techniques that can be used in coding systems, such as perceptual audio and image coding systems for example, to provide a high level of lossless compression without requiring an accurate pre-defined probability distribution of values to be encoded and without imposing high computational complexity.
In accordance with one aspect of the present invention, a signal is encoded by placing signal components into one of a plurality of classifications according to signal component value, each classification having a rank representing a range of values associated with the classification; and for a respective classification, assembling signal components into one or more groups, each group having a number of elements to encode that varies inversely with the rank of the respective classification; and applying an encoding process to each of the groups, wherein the encoding process that is applied to a respective group has a dimension that is proportional to the number of elements in the respective group.
In accordance with another aspect of the present invention, a signal is encoded by placing some signal components into a first classification according to signal component value; assembling the signal components placed into the first classification into one or more first groups each having a number of elements that is equal to a first number; applying an encoding process to each of the first groups, where the encoding process has a dimension that is proportional to the first number; placing at least some of the signal components not placed into the first classification into a second classification according to signal component value; assembling the signal components placed into the second classification into one or more second groups each having a number of elements that is equal to a second number, where the second number is not equal to the first number; and applying an encoding process to each of the second groups, wherein the encoding process has a dimension that is proportional to the second number.
In accordance with a further aspect of the present invention, an encoded signal is decoded by receiving codes that represent one or more signal components placed into one of a plurality of classifications according to signal component value, each classification having a rank representing a range of values associated with the classification; and for each respective code, identifying the respective classification of the signal components represented by the respective code; and applying a decoding process to the respective code to obtain a group of elements, the group having a number of elements that varies inversely with the rank of the respective classification, wherein the decoding process that is applied to the respective code has a dimension that is proportional to the number of elements in the group of elements; and obtaining the one or more signal components from the group of elements.
In accordance with yet another aspect of the present invention, an encoded signal is decoded by applying a decoding process to a first code to obtain a first group having a first number of elements, wherein the decoding process has a dimension that is proportional to the first number; obtaining from the first group one or more signal components placed into a first classification and having values within a range of values associated with the first classification; applying a decoding process to a second code to obtain a respective second group having a second number of elements, wherein the decoding process has a dimension that is proportional to the second number; and obtaining from the second group one or more signal components placed into a second classification having values within a range of values associated with the second classification.