Engineers use a variety of techniques to process digital audio efficiently while still maintaining the quality of the digital audio. To understand these techniques, it helps to understand how audio information is represented and processed in a computer.
I. Representing Audio Information in a Computer.
A computer processes audio information as a series of numbers representing the audio information. For example, a single number can represent an audio sample, which is an amplitude value at a particular time. Several factors affect the quality of the audio information, including sample depth, sampling rate, and channel mode.
Sample depth (or precision) indicates the range of numbers used to represent a sample. The more values possible for the sample, the higher the quality because the number can capture more subtle variations in amplitude. For example, an 8-bit sample has 256 possible values, while a 16-bit sample has 65,536 possible values.
The sampling rate (usually measured as the number of samples per second) also affects quality. The higher the sampling rate, the higher the quality because more frequencies of sound can be represented. Some common sampling rates are 8,000, 11,025, 22,050, 32,000, 44,100, 48,000, and 96,000 samples/second.
Mono and stereo are two common channel modes for audio. In mono mode, audio information is present in one channel. In stereo mode, audio information is present in two channels usually labeled the left and right channels. Other modes with more channels such as 5.1 channel, 7.1 channel, or 9.1 channel surround sound (the “1” indicates a sub-woofer or low-frequency effects channel) are also possible. Table 1 shows several formats of audio with different quality levels, along with corresponding raw bit rate costs.
TABLE 1Bit rates for different quality audio information.SampleDepth(bits/Sampling RateChannelRaw Bit Ratesample)(samples/second)Mode(bits/second)Internet telephony88,000mono64,000Telephone811,025mono88,200CD audio1644,100stereo1,411,200
Surround sound audio typically has even higher raw bit rate. As Table 1 shows, a cost of high quality audio information is high bit rate. High quality audio information consumes large amounts of computer storage and transmission capacity. Companies and consumers increasingly depend on computers, however, to create, distribute, and play back high quality audio content.
II. Processing Audio Information in a Computer.
Many computers and computer networks lack the resources to process raw digital audio. Compression (also called encoding or coding) decreases the cost of storing and transmitting audio information by converting the information into a lower bit rate form. Compression can be lossless (in which quality does not suffer) or lossy (in which quality suffers but bit rate reduction from subsequent lossless compression is more dramatic). For example, lossy compression is used to approximate original audio information, and the approximation is then losslessly compressed. Decompression (also called decoding) extracts a reconstructed version of the original information from the compressed form.
One goal of audio compression is to digitally represent audio signals to provide maximum perceived signal quality with the least possible amounts of bits. With this goal as a target, various contemporary audio encoding systems make use of human perceptual models. Encoder and decoder systems include certain versions of Microsoft Corporation's Windows Media Audio (“WMA”) encoder and decoder and WMA Pro encoder and decoder. Other systems are specified by certain versions of the Motion Picture Experts Group, Audio Layer 3 (“MP3”) standard, the Motion Picture Experts Group 2, Advanced Audio Coding (“AAC”) standard, and Dolby AC3.
Conventionally, an audio encoder uses a variety of different lossy compression techniques. These lossy compression techniques typically involve perceptual modeling/weighting and quantization after a frequency transform. The corresponding decompression involves inverse quantization, inverse weighting, and inverse frequency transforms.
Frequency transform techniques convert data into a form that makes it easier to separate perceptually important information from perceptually unimportant information. The less important information can then be subjected to more lossy compression, while the more important information is preserved, so as to provide the best perceived quality for a given bit rate. A frequency transform typically receives audio samples and converts them into data in the frequency domain, sometimes called frequency coefficients or spectral coefficients.
Perceptual modeling involves processing audio data according to a model of the human auditory system to improve the perceived quality of the reconstructed audio signal for a given bit rate. For example, an auditory model typically considers the range of human hearing and critical bands. Using the results of the perceptual modeling, an encoder shapes distortion (e.g., quantization noise) in the audio data with the goal of minimizing the audibility of the distortion for a given bit rate. While the encoder must at times introduce distortion to reduce bit rate, the weighting allows the encoder to put more distortion in bands where it is less audible, and vice versa.
Typically, the perceptual model is used to derive scale factors (also called weighting factors or mask values) for masks (also called quantization matrices). The encoder uses the scale factors to control the distribution of quantization noise. Since the scale factors themselves do not represent the audio waveform, scale factors are sometimes designated as overhead or side information. In many scenarios, a significant portion (10-15%) of the total number of bits used for encoding is used to represent the scale factors.
Quantization maps ranges of input values to single values, introducing irreversible loss of information but also allowing an encoder to regulate the quality and bit rate of the output. Sometimes, the encoder performs quantization in conjunction with a rate controller that adjusts the quantization to regulate bit rate and/or quality. There are various kinds of quantization, including adaptive and non-adaptive, scalar and vector, uniform and non-uniform. Perceptual weighting can be considered a form of non-uniform quantization.
Conventionally, an audio encoder uses one or more of a variety of different lossless compression techniques, which are also called entropy coding techniques. In general, lossless compression techniques include run-length encoding, run-level coding variable length encoding, and arithmetic coding. The corresponding decompression techniques (also called entropy decoding techniques) include run-length decoding, run-level decoding, variable length decoding, and arithmetic decoding.
Inverse quantization and inverse weighting reconstruct the weighted, quantized frequency coefficient data to an approximation of the original frequency coefficient data. An inverse frequency transform then converts the reconstructed frequency coefficient data into reconstructed time domain audio samples.
Given the importance of compression and decompression to media processing, it is not surprising that compression and decompression are richly developed fields. Whatever the advantages of prior techniques and systems for scale factor compression and decompression, however, they do not have various advantages of the techniques and systems described herein.