A computer processes audio or video information as a series of numbers representing samples of the audio or video information. For high quality audio or video, the computer represents a sample of information using a number with many possible values. The more values possible for the sample, the higher the quality because the number can capture more variations in sound or color. Table 1 shows ranges of possible values for several types of audio or video information of different quality levels, along with corresponding bitrate costs.
TABLE 1Ranges of values and cost per value for different quality audio andvideo informationNumber ofInformation type and qualitypossible valuesCostaudio sequence, voice quality0–255 per sample 8 bits (1 byte)audio sequence, CD quality0–65,535 per sample16 bits (2 bytes)video image, black and white0–1 per pixel 1 bitvideo image, gray scale0–255 per pixel 8 bits (1 byte)video image, “true” color0–16,777,215 per pixel24 bits (3 bytes)
As Table 1 shows, the cost of high quality audio and video information is high bitrate. High quality audio and video information consumes large amounts of computer storage and transmission capacity.
Compression (also called encoding or coding) decreases the cost of storing and transmitting audio and video information by converting the information into a lower bitrate form. Decompression (also called decoding) extracts a reconstructed version of the original information from the compressed form.
Quantization is a conventional compression technique. Quantization maps ranges of input values to single values. For example, a sample with a value anywhere between −1.5 and 1.499999 is mapped to 0, a sample with a value anywhere between 1.5 and 4.499999 is mapped to 1, etc
To reconstruct the sample, the quantized value is multiplied by the quantization factor. After a value has been quantized, however, the original value cannot be precisely reconstructed. In essence, quantization decreases the quality of the signal in order to decrease the bitrate of the signal. Continuing the example started above, the quantized value 1 reconstructs to 1×3=3; it is impossible to determine where the original value was in the range 1.5 to 4.499999.
Several factors affect quantization. For a continuous, analog signal, a dynamic range sets the boundaries of the quantization. Suppose the range of an analog signal is infinite but most samples are close to zero. The dynamic range of the quantization focuses the quantization on the range most likely to yield real information, for example, around zero. For a signal already in numerical form, the dynamic range is bounded by the lowest and highest possible values.
Within the dynamic range, the number of quantization levels affects how closely the quantized signal tracks the input signal. For example, if a dynamic range has 64 quantization levels, each sample is assigned to one of 64 values. Increasing the number of quantization levels in the same dynamic range increases precision and decreases distortion, but also increases bitrate. Quantization step size Q is a related factor that measures the distance between reconstructed values.
There are many different kinds of quantization. In uniform, scalar quantization, each single sample in a signal is quantized by the same step size Q to produce a quantized value. For example, a uniform scalar quantizer maps a set of real numbers {u} into an integer set {−M/2, . . . , −1,0,1, . . . M/2}, where M is the dynamic range of the quantizer and Q is the real number quantization step size. The quantizer produces quantized output according to the following equation:
                                          q            ⁡                          (              u              )                                =                      round            ⁢                                                  ⁢                          (                                                min                  ⁡                                      (                                                                  max                        ⁡                                                  (                                                      u                            ,                                                                                          -                                QM                                                            /                              2                                                                                )                                                                    ,                                              QM                        /                        2                                                              )                                                  Q                            )                                      ,                            (        1        )            
where round is a function for rounding to the closest integer, and the min and max functions set a number outside of the dynamic range to a range boundary value. Other quantization formulas follow different conventions.
The difference between an input value for a sample and its reconstructed value is quantization error. If the input value falls within the dynamic range of the quantizer, quantization error for a sample is no more than Q/2. The larger the quantization step size Q, the greater the potential quantization error. The distortion D is a measure of quantization error for the entire signal, and can be calculated as the square of the differences between the original values and the reconstructed values.D=(u−q(u)Q)2  (2).
Aside from uniform, scalar quantization, other quantization techniques include non-uniform quantization and vector quantization. Quantization can be non-adaptive or adaptive. For more information about quantization and the factors affecting the results of quantization, see Gibson et al., Digital Compression for Multimedia, “Chapter 4: Quantization,” Morgan Kaufman Publishers, Inc., pp. 113–138 (1998).
Quantization helps a compressor reduce the bitrate of audio or video information at some cost to quality. The compressor can use various techniques to provide the best possible quality for a given bitrate, as measured by lowest objective or subjective distortion. These techniques include rate control, transform coding, and masking.
With rate control, a compressor adjusts quantization based upon a rate-distortion function that relates distortion (and hence quantization) to bitrate. The compressor dynamically adjusts quantization to utilize available bitrate.
Transform coding techniques convert data into a form that makes it easier to separate perceptually important information from perceptually unimportant information. The less important information can then be quantized heavily, while the more important information is largely preserved, so as to provide the best quality for a given bitrate. Transform coding techniques typically convert data to the frequency (or spectral) domain. For example, a transform coder converts a time series of audio samples into frequency coefficients, or, for video, transform coder converts pixel data into frequency coefficients. In the frequency domain, low frequency data has greater perceptual importance than high frequency data. Transform coding techniques include discrete cosine transform (“DCT”) modulated lapped transform (“MLT”), fourier transform, subband coding, and wavelets. In practice, input to transform coding techniques is partitioned into blocks, and each block is transform coded. Blocks may or may not overlap. For more information about transform coding, see Gibson et al., “Digital Compression for Multimedia, “Chapter 7: Frequency Domain Coding,” Morgan Kaufman Publishers, Inc., pp. 227–262 (1998).
Masking involves processing spectral data to emphasize perceptually important spectral data, and is typically done prior to quantization. This makes the perceptually important spectral data more robust to the subsequent quantization. Masking itself typically involves selective quantization, applying different levels of quantization to different ranges of spectral data, or can be performed as part of non-uniform or vector quantization.
Compression decreases the bitrate of audio and video information, which reduces storage and transmission costs. Different end users have different storage and transmission capacities, however, as well as different quality requirements. Thus, for example, a Web site operator would like to be able to stream an audio clip previously compressed to 128 kilobits/second (“Kb/s”) to certain end users at 64 Kb/s. A particular end user might then recompress the 64 Kb/s audio clip to 32 Kb/s to save local storage space. In addition, different end users can require different compression formats.
Transcoding converts compressed data of one bitrate or format to compressed data of another bitrate (typically lower) or format. Different transcoders use different techniques.
Some transcoders fully decompress the compressed data and then fully recompress the data to the other bitrate or format. Other transcoders partially decompress the compressed data (converting only the decompressed portions) or convert the compressed data itself without decompression.
Heterogeneous transcoders use different formats for decompression and compression, for example, transcoding compressed MPEG 2 data to compressed H.261 data. Between decompression and compression, the data can be resampled or scaled into an acceptable input format for the compression. The resampling or scaling can require extensive processing, and can unnecessarily reduce quality. Moreover, this type of technique works when any of several available codecs can be used in a system, but is impractical or inconvenient for some real world applications. Homogeneous transcoders use the same format for decompression and compression.
For more information about different types of transcoding and transcoders, see Assuncao et al., “A Frequency-Domain Video Transcoder for Dynamic Bit-Rate Reduction of MPEG-2 Bit Streams”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 8, No. 8, December 1998, pp. 953–967; Assuncao et al., “Buffer Analysis and Control in CBR Video Transcoding”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 10, No. 1, February 2000, pp. 83–92; Werner, “Generic Quantiser for Transcoding of Hybrid Video,” Proceedings of the 1997 Picture Coding Symposium, Berlin, Germany, September 1997; Tudor et al., “Real-Time Transcoding of MPEG-2 Video Bit Streams,” Proceedings of the International Broadcast Convention, Amsterdam, September 1997; and Amir et al., “An Application Level Video Gateway,” ACM Multimedia '95, November 1995.
FIG. 1 shows a generalized prior art transcoder (100) for transcoding audio data. The transcoder (100) is homogeneous—its decompressor (110) and compressor (130) work with the same compression format.
In the decompressor (110), an entropy decoder (112) decodes quantized transform coefficients for the audio data. An inverse quantizer (114) reconstructs the transform coefficients. A buffer (120) stores the reconstructed transform coefficients output by the decompressor (110), which are the input to the compressor (130). In the compressor (130), a quantizer (132) quantizes the reconstructed transform coefficients. To decrease bitrate, the quantizer (132) increases quanization. An entropy encoder (134) then entropy encodes the requantized transform coefficients.
The transcoder (100) can include an inverse transform coder in the decompressor (110) and a transform coder in the compressor (130), in which case the buffer (120) stores a reconstructed time series of audio data. This allows the transcoder (100) to use off-the-shelf decompressor and compressor products.
Because the transcoder (100) increases quantization, the transcoder (100) introduces additional distortion into the requantized data. In practice, the requantized data often has much more distortion than the original data directly quantized at the increased level of quantization. This is because, unlike compression of original data, transcoding involves requantization of data that has been quantized in a previous compression. The Assuncao and Werner papers listed above describe this effect in video data.
The maximum quantization error for a single value is (Q1+Q2)/2. The quantization error after the first quantization is at most Q1/2, and the quantization error due to the second quantization is at most Q2/2. The maximum (Q1+Q2)/2 is much greater than the maximum Q2/2 because Q2 is greater than Q1 (so as to decrease bitrate) and Q1 is significant to start with. For certain values of Q2, however, the quantization error for transcoded data equals the quantization error for directly coded data.
FIG. 2 is a graph (200) showing quantization error of transcoded data for an audio clip (transcoded using the prior art transcoder (100) of FIG. 1) versus quantization error of directly coded data. The graph (200) measures quantization error (220) (summed for samples of the audio clip) as quantization step size Q2 (210) increases. The input source has a Gaussian distribution, and is truncated to avoid overloading the quantizer.
The graph (200) plots transcoded data quantization error (230) for data previously quantized by Q1=1.0 and then requantized by Q2. The graph (200) also plots directly coded data quantization error (240) for data quantized by Q2 without previous quantization by Q1. The area between the transcoded data quantization error (230) and the direct-coded data quantization error (240) is excess requantization error (250).
The transcoded data quantization error (230) and the direct-coded data quantization error (240) are the same for certain integer multiples of Q1 (e.g., Q2=3.0), while for other integer multiples of Q1 (e.g., Q2=2.0) the transcoded data quantization error (230) is much greater than the direct-coded data quantization error (240).
Previous compression with Q1 causes excess requantization error in transcoding. For example, consider the value 0.5631 transcoded and directly coded with different quantization step sizes as shown in Table 2.
TABLE 2Transcoding versus direct coding of a valueSampleQ1Reconstructed ValueQ2Reconstructed ValueError.56311.01.02.02.0−1.4569.5631n/an/a2.00.5631.56311.01.03.00.5631.5631n/an/a3.00.5631
The quantization error when 0.5631 is directly coded with Q2=3.0 is the same as the error when 0.5631 is transcoded with Q1=1.0 and Q2=3.0. This is because the quantization levels for Q1=1.0, { . . . , −1.5,−0.5,0.5,1.5, . . . }, overlap the levels for Q2=3.0, { . . . ,−4.5,−1.5,1.5,4.5, . . . }.
In contrast, the quantization error when 0.5631 is directly coded with Q2=2.0 is much smaller than the error when 0.5631 is transcoded with Q1=1.0 and Q2=2.0. This is because the quantization levels for Q1=1.0 do not overlap the levels for Q2=2.0, { . . . ,−3.0,−1.0,1.0,3.0, . . . }. As a result, rounding of some values by Q1 changes the way Q2 subsequently rounds those values, increasing quantization error for those values.
Excess requantization error is not a major concern if the first quantization step size is very small and thus introduces little distortion. If Q1 introduces significant distortion, however, excess requantization error can become a problem.
The problem of excess requantization error worsens as Q1 increases, and transcoding becomes impractical. If the transcoder uses certain quantization step sizes, distortion dramatically increases. The transcoder cannot decrease bitrate gradually and gracefully.
The excess requantization error problem is exacerbated when the first stage quantization output is concentrated in a narrow range around 0. For such data, any increase in quantization step size causes an immediate and drastic increase in distortion. Maintaining the quantization step size, however, means maintaining the same bitrate. Audio transcoders can face an extreme example of this dilemma, in which the values of first stage quantization output for a frame are only −1, 0, or 1. Any increase to quantization step size silences the frame, making it impossible to decrease bitrate gradually and gracefully, but keeping the previous quantization step size results in the same bitrate.