Engineers use a variety of techniques to process digital audio efficiently while still maintaining the quality of the digital audio. To understand these techniques, it helps to understand how audio information is represented and processed in a computer.
I. Representation of Audio Information in a Computer
A computer processes audio information as a series of numbers representing the audio information. For example, a single number can represent an audio sample, which is an amplitude value at a particular time. Several factors affect the quality of the audio information, including sample depth, sampling rate, and channel mode.
Sample depth (or precision) indicates the range of numbers used to represent a sample. The more values possible for the sample, the higher the quality because the number can capture more subtle variations in amplitude. For example, an 8-bit sample has 256 possible values, while a 16-bit sample has 65,536 possible values. The sampling rate (usually measured as the number of samples per second) also affects quality. The higher the sampling rate, the higher the quality because more frequencies of sound can be represented. Some common sampling rates are 8,000, 11,025, 22,050, 32,000, 44,100, 48,000, and 96,000 samples/second.
Mono and stereo are two common channel modes for audio. In mono mode, audio information is present in one channel. In stereo mode, audio information is present in two channels usually labeled the left and right channels. Other modes with more channels such as 5.1 channel, 7.1 channel, or 9.1 channel surround sound (the “1” indicates a sub-woofer or low-frequency effects channel) are also possible. Table 1 shows several formats of audio with different quality levels, along with corresponding raw bitrate costs.
TABLE 1Bitrates for different quality audio informationSampleDepthSampling RateRaw Bitrate(bits/sample)(samples/second)Mode(bits/second)Internet telephony88,000mono64,000Telephone811,025mono88,200CD audio1644,100stereo1,411,200Surround sound audio typically has even higher raw bitrate.
As Table 1 shows, the cost of high quality audio information is high bitrate. High quality audio information consumes large amounts of computer storage and transmission capacity. Companies and consumers increasingly depend on computers, however, to create, distribute, and play back high quality audio content.
II. Processing Audio Information in a Computer
Many computers and computer networks lack the resources to process raw digital audio. Compression (also called encoding or coding) decreases the cost of storing and transmitting audio information by converting the information into a lower bitrate form. Decompression (also called decoding) extracts a reconstructed version of the original information from the compressed form. Encoder and decoder systems include certain versions of Microsoft Corporation's Windows Media Audio (“WMA”) encoder and decoder and WMA Pro encoder and decoder.
Compression can be lossless (in which quality does not suffer) or lossy (in which quality suffers but bitrate reduction from subsequent lossless compression is more dramatic). For example, lossy compression is used to approximate original audio information, and the approximation is then losslessly compressed. Lossless compression techniques include run-length coding, run-level coding, variable length coding, and arithmetic coding. The corresponding decompression techniques (also called entropy decoding techniques) include run-length decoding, run-level decoding, variable length decoding, and arithmetic decoding.
One goal of audio compression is to digitally represent audio signals to provide maximum perceived signal quality with the least possible amounts of bits. With this goal as a target, various contemporary audio encoding systems make use of a variety of different lossy compression techniques. These lossy compression techniques typically involve perceptual modeling/weighting and quantization after a frequency transform. The corresponding decompression involves inverse quantization, inverse weighting, and inverse frequency transforms.
Frequency transform techniques convert data into a form that makes it easier to separate perceptually important information from perceptually unimportant information. Less important information can then be subjected to more lossy compression, while more important information is preserved, so as to provide the best perceived quality for a given bitrate. A frequency transform typically receives audio samples and converts them from the time domain into data in the frequency domain, sometimes called frequency coefficients or spectral coefficients.
Perceptual modeling involves processing audio data according to a model of the human auditory system to improve the perceived quality of the reconstructed audio signal for a given bitrate. For example, an auditory model typically considers the range of human hearing and critical bands. Using the results of the perceptual modeling, an encoder shapes distortion (e.g., quantization noise) in the audio data with the goal of minimizing the audibility of the distortion for a given bitrate.
Quantization maps ranges of input values to single values, introducing irreversible loss of information but also allowing an encoder to regulate the quality and bitrate of the output. Sometimes, the encoder performs quantization in conjunction with a rate controller that adjusts the quantization to regulate bitrate and/or quality. There are various kinds of quantization, including adaptive and non-adaptive, scalar and vector, uniform and non-uniform. Perceptual weighting can be considered a form of non-uniform quantization. Inverse quantization and inverse weighting reconstruct the weighted, quantized frequency coefficient data to an approximation of the original frequency coefficient data. An inverse frequency transform then converts the reconstructed frequency coefficient data into reconstructed time domain audio samples.
Joint coding of audio channels involves coding information from more than one channel together to reduce bitrate. For example, mid/side coding (also called M/S coding or sum-difference coding) involves performing a matrix operation on left and right stereo channels at an encoder, and sending resulting “mid” and “side” channels (normalized sum and difference channels) to a decoder. The decoder reconstructs the actual physical channels from the “mid” and “side” channels. M/S coding is lossless, allowing perfect reconstruction if no other lossy techniques (e.g., quantization) are used in the encoding process.
Intensity stereo coding is an example of a lossy joint coding technique that can be used at low bitrates. Intensity stereo coding involves summing a left and right channel at an encoder and then scaling information from the sum channel at a decoder during reconstruction of the left and right channels. Typically, intensity stereo coding is performed at higher frequencies where the artifacts introduced by this lossy technique are less noticeable.
Given the importance of compression and decompression to media processing, it is not surprising that compression and decompression are richly developed fields. Whatever the advantages of prior techniques and systems, however, they do not have various advantages of the techniques and systems described herein.