This invention relates generally to the coding of audio signals and more particularly to a method of lossless compression of audio data for use in the transmission and/or storage of audio information.
Over the past ten to twenty years, the audio industry has seen a major transition from analog formats, such as cassette tapes, FM radio, and records to new digital formats such as the compact disc (CD), mini-disks (MD), digital versatile disks (DVD), and others. The widespread use of personal computers and the Internet has furthered this trend with the introduction of new electronic music services that allow electronic distribution of music and/or other audio content through a computer and the Internet. Many of these digital audio products and services use various audio compression technologies (e.g., MP3, Dolby AC3, ATRACS, MPEG-AAC, and Windows Media Player) to reduce the bit rate of audio transmissions to the range of 64-256 kbps from the 1440 kbps used on many uncompressed recordings, such as CDs, while maintaining a sufficient quality of high fidelity music reproduction. The use of compression technologies as well as the increased storage capacity of semiconductor (i.e., SRAM, DRAM, and Flash) devices and computer disks has made possible several new products including the RIO portable music player, the AudioRequest music jukebox, the Lansonic(trademark) Digital Audio Server, and other devices.
In a typical digital audio application, an analog audio signal is sampled, for example, at 32, 44.1, or 48 kHz, and then is digitized with 16 or more bits using an analog-to-digital converter. If the audio source is a stereo source, then this process may be repeated for both the right and left channels. New surround sound audio may have six or more channels, each of which may be sampled and digitized. A typical CD contains two stereo channels, each of which is sampled at 44.1 kHz with 16 bits per sample, resulting in a data rate of approximately 1411.2 kbps. This allows storage of slightly more than 1 hour of music on a 650 MB CD. In a playback application, the digital music samples may be converted to an analog signal using a digital-to-analog converter, and then amplified and played through one or more speakers.
Several audio compression techniques may be used to compress a stereo music signal to the range of 64-256 kbps without significantly changing the quality of the audio signal (i.e., while maintaining CD-like quality). The MPEG-1 standard, developed and maintained by a working group of the International Standards Organization (ISO/IEC), describes three audio compression methods, referred to as Layers 1, 2, and 3, for reducing the bit rate of a digital audio signal. The method described under Layer 3, which is commonly known as MP3, is generally considered to achieve acceptable quality at 128 kbps and very good quality at 256 kbps.
These audio compression methods, as well as some other lossy techniques, use frequency domain coding techniques with a complex psychoacoustic model to discard portions of the audio signal that are considered inaudible. The techniques may be used to achieve near-CD quality at compression ratios of about, for example, 5-to-1 (256 kbps) or 11-to-1 (128 kbps). However, psychoacoustic modeling is an inexact process and some approaches may introduce artifacts into the audio signal that may be audible and annoying to some listeners. As a result, lossy compression may be less desirable in some applications requiring very high audio quality.
In the absence of any compression, the storage capacity of current consumer hard drives is quite limited. A large capacity hard drive, such as one with a capacity of 60-80 GB, can only store approximately 95-125 hours of uncompressed CD-quality music. In contrast, a CD changer may hold as many as 400 discs, providing over 400 hours of audio. As a result, some method of significantly increasing the amount of audio that can be stored on a hard drive without increasing cost or adding artifacts is useful.
One method of increasing the amount of data that can be stored is to compress the data before storing the data and then to expand the compressed data when needed. In lossy compression methods such as MP3, the expanded data differs slightly from the original data. For audio and video signals, this may be acceptable as long the differences are not too significant. However, for computer data, any difference may be unacceptable. As a result, lossless compression methods for which the expanded data are identical to the original uncompressed data have been developed. Various lossless or xe2x80x9centropyxe2x80x9d coders attempt to remove redundancies from data (for example, after every xe2x80x9cqxe2x80x9d there is a xe2x80x9cuxe2x80x9d) and exploit the unequal probability of certain types of data (for example, vowels occur more often than other letters). Computer programs such as xe2x80x9ctarxe2x80x9d and xe2x80x9cZIPxe2x80x9d have been developed to perform lossless compression on documents and other computer files. These algorithms are typically based on methods developed by Ziv and Lempel or use other standard method such as Huffman coding or Arithmetic coding techniques (see, for example, T. Bell et. al., xe2x80x9cText Compressionxe2x80x9d, Prentice-Hall, 1990).
Unfortunately, many lossless coding techniques designed for text or other computer-type data do not perform well on digital audio data. In fact, programs such as xe2x80x9cZIPxe2x80x9d actually may enlarge an audio file rather than compressing the file. The problem is that these techniques assume certain features that may be common in text files but are not typically found in audio data.
Methods for lossless compression of audio typically attempt to compress an audio file by exploiting certain redundancies in the audio signal. Generally, these redundancies can be applied either in the time domain via prediction or in the frequency domain via bit allocation. In addition, entropy coding can be applied to take advantage of the varying probability of different data values by assigning shorter sequences of bits to represent higher probability values and longer sequences of bits to represent lower probability values. The result is a reduction in the average number of bits required to represent all of the data values. These advantages have resulted in the incorporation of lossless compression into the DVD-Audio format (see, xe2x80x9cMeridian Lossless Packing Enabling High-Resolution Surround on DVD-Aduioxe2x80x9d, MIX, December 1998).
One technique for lossless compression is to divide the audio signal into segments or frames. Then, for each frame, to compute a low-order linear predictor that is quantized and stored for that frame. This predictor then may be applied to all the audio samples in the frame, and the prediction residuals (i.e., the error after prediction) may be coded using some form of entropy-type coder, such as, for example, a Huffman, Golomb, Rice, run-length, or arithmetic coder. In xe2x80x9cOptimization of Digital Audio for Internet Transmissionxe2x80x9d (May 1998), Mat Hans describes the AudioPak lossless audio coder. This coder combines four low-order linear predictors (0, 1st, 2nd, and 3rd order), each having fixed prediction weights corresponding to known polynomials, with Golomb coding. Use of very low order predictors with fixed predictor weights results in a very simple algorithm with low complexity, but at the expense of lower prediction gain and larger file sizes.
In U.S. Pat. No. 5,839,100, Wegener describes a lossless audio coder that may be used in the MUSICompress system. The Wegener method uses decimation (i.e., selection of every Nth sample) to implement non-linear time domain prediction of an audio signal which is combined with Huffman coding. Decimation introduces aliasing into the predicted signal whereby signal components at the same modulo N frequency are summed. This may distort the signal in a way that prevents accurate prediction of all frequency components, causing lower compression rates.
A paper titled, xe2x80x9cSHORTEN: Simple lossless and near-lossless waveform compressionxe2x80x9d, by Tony Robinson (December 1994) and U.S. Pat. No. 6,041,302 by Bruekers describe a lossless audio compression system using linear prediction and Rice coding. Rice coding is a form of Huffman coding optimized for Laplacian distributions. Rice codes form a family of codes parameterized by a single parameter xe2x80x9cmxe2x80x9d that can be adjusted to reasonably fit the statistics of the audio prediction residuals.
Prediction may be used to remove redundancy from the signal prior to coding in a lossless or a lossy system for coding audio signals. In a lossy speech coding application, modest (e.g., 8-14th) order adaptive linear predictors may be applied to each frame of speech (for example, 15-30 ms per frame) and predictor coefficients or weights may be computed using the autocorrelation or covariance methods. The predictor weights for this so-called xe2x80x9cforwardxe2x80x9d predictor then may be quantized for passage to the decoder to form part of the side information for a frame. Many methods for efficient quantization of linear predictor coefficients have been devised, including transformation to partial correlation coefficients, reflection coefficients, or line spectral pairs, and using scalar and/or vector quantization.
Many low bit rate speech coders use forward prediction, where predictor coefficients are computed on data that has yet to be processed by the decoder, rather than backward prediction, where predictor coefficients are computed on data already processed by the decoder.
In a backward prediction system, data determining the prediction coefficients are known to both the encoder and decoder, which means that, usually, predictor coefficients are not quantized and extra side information bits are not used. Backward prediction systems that do not use extra bits may be adapted quite rapidly. However, they may be sensitive to bit errors or missing data, and, due to error feedback they may provide lower overall quality when used in low bit rate lossy speech coding. As a result, backward prediction is generally used only in higher bit rate ( greater than =16 kbps) speech coding applications such as the ITU G.728 LD-CELP speech coding standard.
In a first general aspect, lossless audio coding uses a combined forward and backward predictor for better approximation of an audio signal. Forward prediction is applied as a first stage and backward prediction is applied as a second stage. The overall prediction error is reduced, which results in smaller file sizes with lower complexity than when just forward prediction is used.
In another general aspect, an improved entropy coder more closely fits the statistics of the audio prediction residuals. A modified Golomb coder is parameterized by, for example, two parameters. An effective search procedure is used to find the best parameter values for each frame, resulting in more efficient entropy coding with smaller file sizes than previous techniques.
In one general aspect, digital samples that have been obtained from an audio signal are compressed into output bits that can be used, for example, to transmit and/or store the audio data. The digital samples are compressed by first dividing the samples into one or more frames, where each frame includes multiple samples. Each frame is compressed by computing a first predictor for the digital samples within the frame, with the first predictor being characterized by first prediction coefficients. Then, the first prediction coefficients are quantized to produce first predictor bits. The frames also are divided into one or more subsets, where each subset contains at least one of the digital samples. Next, a subset predictor is computed for a subset using digital samples contained in previous subsets. Error samples are produced using the first predictor bits and the subset predictor. These error samples are entropy coded to produce codeword bits. The first predictor bits and the codeword bits then are used in output bits for decompressing digital information.
Implementations may include one or more of the following features. For example, the first predictor may be a linear predictor, such as a first order linear predictor. Prediction coefficients may be quantized using scalar quantization for some or all of the prediction coefficients. The prediction coefficients also may be quantized using vector quantization. The first prediction coefficients may be computed by windowing digital samples to produce windowed samples. Autocorrelation coefficients may be computed from the windowed samples, and the first prediction coefficients may be computed by solving a system of linear equations using the autocorrelation coefficients.
A subset predictor may be used to compute prediction coefficients using only digital samples contained in previous subsets of the frame being computed.
The entropy coding of error samples to produce codeword bits may use at least one code parameter that determines the format of the codeword bit. The value of the code parameter may be encoded into one or more of the code parameter bits and included in the output bits. The code parameter bits may be determined by comparing two or more possible values of the code parameter and then encoding into the code parameter bits the value of the code parameter which is estimated to yield the smallest number of codeword bits. Also, the code parameter bits may be determined by entropy coding the error samples using two or more possible values of the code parameter and then encoding into the code parameter bits the value of the code parameter that yields the smallest number of codeword bits.
Error samples may be produced by first processing the digital samples using the first predictor to produce intermediate samples. The intermediate samples may be processed using the subset predictor to produce the error samples.
The output bits of the coder are such that they can be used with a suitable decoder to enable a substantially lossless reconstruction of the digital samples.
In one example, the frame contains 1152 digital samples which are divided into 48 subsets each containing 24 digital samples.
In another general aspect, compressing digital samples obtained from an audio source into output bits includes dividing the digital samples into frames, with each frame containing one or more of the digital samples. The digital samples then may be processed to produce error samples. These error samples may be entropy coded to produce codeword bits. The entropy coding uses at least a first code parameter and a second code parameter, with each code parameter varying from frame to frame. The codeword bits may be included in output bits.
Compressing digital samples may include using entropy coding that produces codeword bits as a combination of at least two terms. The first term may include a predetermined number of codeword bits, and the second term may include a variable number of codeword bits. The value of the first term may include information on the least significant bits of an error sample and/or information on the sign of the error sample. The number of codeword bits in the second term may be greater for an error sample with large magnitude and smaller for an error sample with small magnitude. The number of codeword bits in the first term may depend, at least in part, on the first code parameter, and the number of codeword bits in the second term may depend, at least in part, on the second code parameter.
The first code parameter for a frame may be encoded with the first code parameter bits, and the second code parameter for a frame may be encoded with the second code parameter bits. The first and second code parameter bits may be included in the output bits.
Error samples may be produced by computing one or more predictors for a frame and using the predictors to produce errors samples from the digital samples. The digital samples also may include first channel samples from a first channel of the audio source and second channel samples from a second channel of the audio source. The digital samples may be processed to produce error samples. The processing may include predicting the second channel samples from the first channels samples.
Error samples may be processed for a frame by computing a first predictor for the digital samples in a frame, with the first predictor having first prediction coefficients. The first prediction coefficients may be quantized to produce first predictor bits. The digital samples in a frame may be divided into one or more subsets. Each subset may contain one or more digital samples. A subset predictor may be computed for at least one of the subsets, using the digital samples contained in previous subsets. Error samples may be produced by processing the digital samples in a frame using both the first predictor and the subset predictor. The first predictor bits may be included in the output of the coder.
In another general aspect, audio data is reconstructed from output bits generated by an audio coder. Output bits, generated by an audio coder, are received and codeword bits, a first code parameter, a second code parameter, and predictor bits are obtained from the output bits. Error samples are reconstructed from the codeword bits using the first code parameter and the second code parameter. An error signal is computed from the reconstructed error samples. Error samples may be reconstructed by entropy decoding the codeword bits. Also, prediction coefficients are reconstructed using the predictor bits that were previously generated by quantizing the prediction coefficients.
The codeword bits may be a combination of at least two terms, including a first term that includes a predetermined number of codeword bits, and a second term including a variable number of codeword bits. The number of codeword bits in the second term may generally be greater for an error sample with large magnitude and generally smaller for an error sample with small magnitude. The value of the first term may include information on the least significant bits of an error sample.
The number of codeword bits in the first term may depend at least in part on the first code parameter and the number of codeword bits in the second term may depend at least in part on the second code parameter.
Audio data may be reconstructed using the prediction coefficients and the error samples by dividing the error samples for a frame into one or more subsets. Each subset may contain at least one of the error samples for the frame. A subset predictor is then computed for at least one of the subsets using information from previous subsets. The audio data may then be reconstructed using the prediction coefficients, the subset predictor, and the error samples.
The audio data may include first audio data for a first audio channel and second audio data for a second audio channel. In this case, audio data may be reconstructed by reconstructing the first audio data, and then reconstructing the second audio data using the first audio data.
Other features and advantages will be apparent from the description and drawings, and from the claims.