An encoder is a device, circuitry, or computer program that is capable of analyzing a signal such as an audio signal and outputting a signal in an encoded form. The resulting signal is often used for transmission, storage, and/or encryption purposes. On the other hand, a decoder is a device, circuitry, or computer program that is capable of inverting the encoder operation, in that it receives the encoded signal and outputs a decoded signal.
In most state-of-the-art encoders such as audio encoders, each frame of the input signal is analyzed and transformed from the time domain to the frequency domain. The result of this analysis is quantized and encoded and then transmitted or stored depending on the application. At the receiving side (or when using the stored encoded signal) a corresponding decoding procedure followed by a synthesis procedure makes it possible to restore the signal in the time domain.
Codecs (encoder-decoder) are often employed for compression/decompression of information such as audio and video data for efficient transmission over bandwidth-limited communication channels.
So called transform coders or more generally, transform codecs are normally based around a time-to-frequency domain transform such as a DCT (Discrete Cosine Transform), a Modified Discrete Cosine Transform (MDCT) or some other lapped transform which allow a better coding efficiency relative to the hearing system properties. A common characteristic of transform codecs is that they operate on overlapped blocks of samples i.e. overlapped frames. The coding coefficients resulting from a transform analysis or an equivalent sub-band analysis of each frame are normally quantized and stored or transmitted to the receiving side as a bit-stream. The decoder, upon reception of the bit-stream, performs de-quantization and inverse transformation in order to reconstruct the signal frames.
So-called perceptual encoders use a lossy coding model for the receiving destination i.e. the human auditory system, rather than a model of the source signal. Perceptual audio encoding thus entails the encoding of audio signals, incorporating psychoacoustical knowledge of the auditory system, in order to optimize/reduce the amount of bits necessary to reproduce faithfully the original audio signal. In addition, perceptual encoding attempts to remove i.e. not transmit or approximate parts of the signal that the human recipient would not perceive, i.e. lossy coding as opposed to lossless coding of the source signal. The model is typically referred to as the psychoacoustical model. In general, perceptual coders will have a lower signal to noise ratio (SNR) than a waveform coder will, and a higher perceived quality than a lossless coder operating at equivalent bit rate.
A perceptual encoder uses a masking pattern of stimulus to determine the least number of bits necessary to encode i.e. quantize each frequency sub-band, without introducing audible quantization noise.
Existing perceptual coders operating in the frequency domain usually use a combination of the so-called Absolute Threshold of Hearing (ATH) and both tonal and noise-like spreading of masking in order to compute the so-called Masking Threshold (MT) [1]. Based on this instantaneous masking threshold, existing psychoacoustical models compute scale factors which are used to shape the original spectrum so that the coding noise is masked by high energy level components e.g. the noise introduced by the coder is inaudible [2].
Perceptual modeling has been extensively used in high bit rate audio coding. Standardized coders, such as MPEG-1 Layer III [3], MPEG-2 Advanced Audio Coding [4], achieve “CD quality” at rates of 128 kbps and respectively 64 kbps for wideband audio. Nevertheless, these codecs are by definition forced to underestimate the amount of masking to ensure that distortion remains inaudible. Moreover, wideband audio coders usually use a high complexity auditory (psychoacoustical) model, which is not very reliable at low bit rate (below 64 kbps).