In a typical audio coding environment, data is represented as a long sequence of symbols which is input to an encoder. The input data is encoded by an encoder, transmitted over a communication channel (or simply stored), and decoded by a decoder. During encoding, the input is pre-processed, sampled, converted, compressed or otherwise manipulated into a form for transmission or storage. After transmission or storage, the decoder attempts to reconstruct the original input.
Audio coding techniques can be generally categorized into two classes, namely the time-domain techniques and frequency-domain ones. Time-domain techniques, e.g., ADPCM, LPC, operate directly in the time domain while the frequency-domain techniques transform the audio signals into the frequency domain where compression is performed. Frequency-domain codecs (compressors/decompressors) can be further separated into either subband or transform coders, although the distinction between the two is not always clear. That is, sub-band coders typically use bandpass filters to divide an input signal into a small number (e.g., four) of sub-bands, whereas transform coders typically have many sub-bands (and therefore a correspondingly large number of transform coefficients). Processing an audio signal in the frequency domain is motivated by both classical signal processing theories and human perception psychoaoustics model.
Psychoacoustics take advantage of known properties of the listener in order to reduce information content. For example, the inner ear, specifically the basilar membrane, behaves like a spectral analyzer and transforms the audio signal into spectral data before further neural processing proceeds. Frequency-domain audio codecs often take advantage of auditory masking that is occurring in the human hearing system by modifying an original signal to eliminate information redundancies. Since human ears are incapable of perceiving these modifications, one can achieve efficient compression without distortion.
Masking analysis is usually conducted in conjunction with quantization so that quantization noise can be conveniently "masked." In modern audio coding techniques, the quantized spectral data are usually further compressed by applying entropy coding, e.g., Huffman coding. Compression is required because communication channels usually have limited available capacity or bandwidth. It is frequently necessary to reduce the information content of input data in order to allow it to be reliably transmitted, if at all, over the communication channel.
Tremendous effort has been invested in developing lossless and lossy compression techniques for reducing the size of data to transmit or store. One popular lossless technique is Huffman encoding, which is a particular form of entropy encoding. Entropy coding assigns code words to different input sequences, and stores all input sequences in a code book. The complexity of entropy encoding depends on the number m of possible values an input sequence X may take. For small m, there are few possible input combinations, and therefore the code book for the messages can be very small (e.g., only a few bits are needed to unambiguously represent all possible input sequences). For digital applications, the code alphabet is most likely a series of binary digits {0, 1}, and code word lengths are measured in bits.
If it is known that input is composed of symbols having equal probability of occurring, an optimal encoding is to use equal length code words. But, it is not typical that an input stream has equal probability of receiving any particular message. In practice, certain messages are more likely than others, and entropy encoders take advantage of such data correlation to minimize the average length of code words among expected inputs. Traditionally, however, fixed length input sequences are assigned variable length codes (or conversely, variable length sequences are assigned fixed length codes).
By their nature, however, most compression techniques for audiovisual data are lossy processes. The level of quality and fidelity delivered in sound and video files depends primarily on how much bandwidth is available and whether the compressor/de-compressor (codec) is optimized to prepare output for an available bandwidth.