Engineers use a variety of techniques to process digital audio efficiently while still maintaining the quality of the digital audio. To understand these techniques, it helps to understand how audio information is represented and processed in a computer.
I. Representing Audio Information in a Computer
A computer processes audio information as a series of numbers representing the audio information. For example, a single number can represent an audio sample, which is an amplitude value at a particular time. Several factors affect the quality of the audio information, including sample depth, sampling rate, and channel mode.
Sample depth (or precision) indicates the range of numbers used to represent a sample. The more values possible for the sample, the higher the quality because the number can capture more subtle variations in amplitude. For example, an 8-bit sample has 256 possible values, while a 16-bit sample has 65,536 possible values.
The sampling rate (usually measured as the number of samples per second) also affects quality. The higher the sampling rate, the higher the quality because more frequencies of sound can be represented. Some common sampling rates are 8,000, 11,025, 22,050, 32,000, 44,100, 48,000, and 96,000 samples/second.
Mono and stereo are two common channel modes for audio. In mono mode, audio information is present in one channel. In stereo mode, audio information is present in two channels usually labeled the left and right channels. Other modes with more channels such as 5.1 channel, 7.1 channel, or 9.1 channel surround sound (the “1” indicates a sub-woofer or low-frequency effects channel) are also possible. Table 1 shows several formats of audio with different quality levels, along with corresponding raw bit rate costs.
TABLE 1Bit rates for different quality audio information.SampleSamplingRaw BitDepthRateRate(bits/(samples/Channel(bits/sample)second)Modesecond)Internet telephony88,000mono64,000Telephone811,025mono88,200CD audio1644,100stereo1,411,200
Surround sound audio typically has even higher raw bit rate. As Table 1 shows, a cost of high quality audio information is high bit rate. High quality audio information consumes large amounts of computer storage and transmission capacity. Companies and consumers increasingly depend on computers, however, to create, distribute, and play back high quality audio content.
II. Processing Audio Information in a Computer
Many computers and computer networks lack the resources to process raw digital audio. Compression (also called encoding or coding) decreases the cost of storing and transmitting audio information by converting the information into a lower bit rate form. Compression can be lossless (in which quality does not suffer) or lossy (in which quality suffers but bit rate reduction from subsequent lossless compression is more dramatic). For example, lossy compression is used to approximate original audio information, and the approximation is then losslessly compressed. Decompression (also called decoding) extracts a reconstructed version of the original information from the compressed form.
One goal of audio compression is to digitally represent audio signals to provide maximum perceived signal quality with the least possible amounts of bits. With this goal as a target, various contemporary audio encoding systems make use of human perceptual models. Encoder and decoder systems include certain versions of Microsoft Corporation's Windows Media Audio (“WMA”) encoder and decoder and WMA Pro encoder and decoder. Other systems are specified by certain versions of the Motion Picture Experts Group, Audio Layer 3 (“MP3”) standard, the Motion Picture Experts Group 2, Advanced Audio Coding (“AAC”) standard, and Dolby AC3. Such systems typically use a combination lossy and lossless compression and decompression.
A. Lossy Compression and Corresponding Decompression.
Conventionally, an audio encoder uses a variety of different lossy compression techniques. These lossy compression techniques typically involve perceptual modeling/weighting and quantization after a frequency transform. The corresponding decompression involves inverse quantization, inverse weighting, and inverse frequency transforms.
Frequency transform techniques convert data into a form that makes it easier to separate perceptually important information from perceptually unimportant information. The less important information can then be subjected to more lossy compression, while the more important information is preserved, so as to provide the best perceived quality for a given bit rate. A frequency transform typically receives audio samples and converts them into data in the frequency domain, sometimes called frequency coefficients or spectral coefficients.
Perceptual modeling involves processing audio data according to a model of the human auditory system to improve the perceived quality of the reconstructed audio signal for a given bit rate. Using the results of the perceptual modeling, an encoder shapes noise (e.g., quantization noise) in the audio data with the goal of minimizing the audibility of the noise for a given bit rate.
Quantization maps ranges of input values to single values, introducing irreversible loss of information but also allowing an encoder to regulate the quality and bit rate of the output. Sometimes, the encoder performs quantization in conjunction with a rate controller that adjusts the quantization to regulate bit rate and/or quality. There are various kinds of quantization, including adaptive and non-adaptive, scalar and vector, uniform and non-uniform. Perceptual weighting can be considered a form of non-uniform quantization.
Inverse quantization and inverse weighting reconstruct the weighted, quantized frequency coefficient data to an approximation of the original frequency coefficient data. An inverse frequency transform then converts the reconstructed frequency coefficient data into reconstructed time domain audio samples.
B. Lossless Compression and Decompression.
Conventionally, an audio encoder uses one or more of a variety of different lossless compression techniques, which are also called entropy coding techniques. In general, lossless compression techniques include run-length encoding, variable length encoding, and arithmetic coding. The corresponding decompression techniques (also called entropy decoding techniques) include run-length decoding, variable length decoding, and arithmetic decoding.
Run-length encoding is a simple, well-known compression technique. In general, run-length encoding replaces a sequence (i.e., run) of consecutive symbols having the same value with the value and the length of the sequence. In run-length decoding, the sequence of consecutive symbols is reconstructed from the run value and run length. Numerous variations of run-length encoding/decoding have been developed.
Run-level encoding is similar to run-length encoding in that runs of consecutive symbols having the same value are replaced with run lengths. The value for the runs is the predominant value (e.g., 0) in the data, and runs are separated by one or more levels having a different value (e.g., a non-zero value).
The results of run-length encoding (e.g., the run values and run lengths) or run-level encoding can be variable length coded to further reduce bit rate. If so, the variable length coded data is variable length decoded before run-length decoding.
Variable length coding is another well-known compression technique. In general, a variable length code [“VLC”] table associates VLCs with unique symbol values (or unique combinations of values). Huffman codes are a common type of VLC. Shorter codes are assigned to more probable symbol values, and longer codes are assigned to less probable symbol values. The probabilities are computed for typical examples of some kind of content. Or, the probabilities are computed for data just encoded or data to be encoded, in which case the VLCs adapt to changing probabilities for the unique symbol values. Compared to static variable length coding, adaptive variable length coding usually reduces the bit rate of compressed data by incorporating more accurate probabilities for the data, but extra information specifying the VLCs may also need to be transmitted.
To encode symbols, a variable length encoder replaces symbol values with the VLCs associated with the symbol values in the VLC table. To decode, a variable length decoder replaces the VLCs with the symbol values associated with the VLCs.
In scalar variable length coding, a VLC table associates a single VLC with one value, for example, a direct level of a quantized data value. In vector variable length coding, a VLC table associates a single VLC with a combination of values, for example, a group of direct levels of quantized data values in a particular order. Vector variable length encoding can lead to better bit rate reduction than scalar variable length encoding (e.g., by allowing the encoder to exploit probabilities fractionally in binary VLCs). On the other hand, the VLC table for vector variable length encoding can be extremely large when single codes represent large groups of symbols or symbols have large ranges of potential values (due to the large number of potential combinations), which consumes memory and processing resources in computing the VLC table and finding VLCs. Numerous variations of variable length encoding/decoding have been developed.
Arithmetic coding is another well-known compression technique. Arithmetic coding is sometimes used in applications where the optimal number of bits to encode a given input symbol is a fractional number of bits, and in cases where a statistical correlation among certain individual input symbols exists. Arithmetic coding generally involves representing an input sequence as a single number within a given range. Typically, the number is a fractional number between 0 and 1. Symbols in the input sequence are associated with ranges occupying portions of the space between 0 and 1. The ranges are calculated based on the probability of the particular symbol occurring in the input sequence. The fractional number used to represent the input sequence is constructed with reference to the ranges. Therefore, probability distributions for input symbols are important in arithmetic coding schemes.
In context-based arithmetic coding, different probability distributions for the input symbols are associated with different contexts. The probability distribution used to encode the input sequence changes when the context changes. The context can be calculated by measuring different factors that are expected to affect the probability of a particular input symbol appearing in an input sequence.
Given the importance of compression and decompression to media processing, it is not surprising that compression and decompression are richly developed fields. Whatever the advantages of prior techniques and systems for lossless compression and decompression, however, they do not have various advantages of the techniques and systems described herein.