1. Field of the Invention
This invention relates to the field of audio compression, and in particular to an audio decoder with programmable downmix coefficients and reconfigurable downmix and windowing operations.
2. Description of the Related Art
The digital audio coding used on Compact Discs (16-bit PCM) yields a total range of 96 dB from the loudest sound to the noise floor. This is achieved by taking 16-bit samples 44,100 times per second for each channel, an amount of data often too immense to store or transmit economically, especially when multiple channels are required. As a result, new forms of digital audio coding have been developed to allow the use of lower data rates with a minimum of perceived degradation of sound quality.
Lossy audio compression uses fewer bits to represent each sample, but a trade-off in quality occurs since the fewer the bits used to describe an audio signal, the greater the noise. To minimize the trade-off, compression algorithms take advantage of psychoacoustic phenomena such as auditory masking and the frequency dependence of perceived loudness. Consequently, noise is lowered when no audio signal is present, but effectively masked when strong audio signals are present. Since audio signals can only mask noise that occurs at nearby frequencies, when audio signals are present in only some parts of the audio spectrum some compression algorithms reduce the noise in the other parts of the spectrum.
Typically, the audio spectrum of each channel is divided into narrow frequency bands of different sizes optimized with respect to the frequency selectivity of human hearing. This makes it possible to sharply filter coding noise so that it is forced to stay very close in frequency to the frequency components of the audio signal being coded. By reducing or eliminating coding noise wherever there are no audio signals to mask it, the sound quality of the original signal can be subjectively preserved.
Often, coding bits are allocated among the filter bands as needed by the particular frequency spectrum or dynamic nature of the program. A built-in model of auditory masking may allow the coder to alter its frequency selectivity (as well as time resolution) to make sure that a sufficient number of bits are used to describe the audio signal in each band, thus ensuring noise is fully masked. On a higher level, the audio compression algorithm may also decide how to allocate coding bits among the various channels from a common bit pool. This technique allows channels with greater frequency content to demand more data than sparsely occupied channels, for example, or strong sounds in one channel to provide masking for noise in other channels.
Thus, the algorithms which employ "perceptual subband/transform coding" analyze the spectral components of the audio signal by calculating a transform and apply a psychoacoustic model to estimate the just-noticeable noise-level. In a subsequent quantization and coding stage, the algorithms try to allocate the available number of data bits in a way to meet both the bitrate and masking requirements. Typical 16-bit audio sampling frequencies include 32, 44.1, and 48 kHz. The final bitrate of the bitstream may range from 32 kbps to 448 kbps (kilo-bits per second).
The audio data in the bitstream is presented in audio frames, where each frame represents audio signal information for a given time interval. For example, an AC-3 audio frame consists of six audio blocks, each audio block containing 256 samples of audio data per channel. Similarly, each MPEG audio frame can be considered to be made of 12 blocks (for MPEG-1) or 36 blocks (for MPEG-2), with each block comprising 32 samples per audio channel. To prevent audio signal discontinuities, each audio block includes audio information which overlaps into the time interval for the next audio block. The audio signals from each audio block are combined together at the overlap, with the contributions from each being scaled so that a smooth transition from one audio block to the next occurs. This technique is referred to as "windowing". FIG. 1 shows a block of windowing coefficients 10 and audio signals from four sequential audio blocks 12, 14, 16, 18. A sequence of windowed audio data 20 is shown divided into four time intervals 22, 24, 26, 28. In the first interval 22, the audio data 20 is generated from the audio signals from the first audio block by multiplying theses signals with appropriate windowing coefficients, i.e. A.sub.i =W.sub.i S.sub.i for 0&lt;i.ltoreq.N/2. Thereafter, the audio data 20 is found by combining the audio signals from overlapping audio blocks, using the widowing coefficients, i.e. A.sub.i+N/2 =W.sub.i S.sub.i .vertline..sub.current +W.sub.i+N/2 S.sub.i+N/2 .vertline..sub.previous for interval 24. The weighted averaging of the overlapped audio signals provides for smooth transitions from one audio frame to the next.
The components of a typical audio frame are the header, CRC, the audio data and the auxiliary data. The header contains parameters such as sampling frequency and data rate that govern the rest of the frame. The CRC is an error detection code which may be optional and have its presence/absence specified in the header. The audio data consists of the actual compressed sound. The auxiliary data may be a user-defined field. The length of this field may be variable in order to obtain the overall frame length specified by the standard.
Within a single AC-3 or MPEG-2 compliant audio bitstream, up to five compressed audio channels and an uncompressed Low Frequency Effects (LFE) channel may be included. However, fewer channels are commonly employed. MPEG-1 bitstreams have only one or two audio channels, and for backwards compatibility, MPEG-2 bitstreams sometimes employ "downmixing" to get information from the five channels into two channels so that all the audio information is present for MPEG-1 decoders. In this approach, the left audio channel L may include mixed-in center (C) and left-surround (LS) channels, and the right audio channel R may include mixed in center (C) and right-surround (RS) channels. The mixing coefficients and C, LS, and RS are then included in the bitstream so that MPEG-2 decoders can reproduce the five channels individually.
Most audio reproduction systems do not necessarily have the same number of loudspeakers as the number of encoded source audio channels, and consequently audio downmixing is necessary to reproduce the complete effect of all audio channels over systems with different speaker configurations. Both Dolby Labs and ISO/IEC MPEG Audio Standards Committee have published standards specifying sets of downmixing equations for audio decoding to ensure that acceptable quality audio output is reproduced on different speaker configurations.
It is however desirable to produce a single, minimal common set of downmixing equations which may be used to decode audio bitstreams encoded according to Dolby AC-3 and MPEG standards, and which may be further used to reconstruct a fully programmable user-specified number of output audio channels. It is also desirable to provide an audio decoder with reduced memory requirements and reduced computational requirements.