1. Field of the Invention
This invention relates generally to audio decoders. More particularly, the present invention relates to mull-channel audio compression decoders with downmixing capabilities.
2. Description of the Related Art
An audio decoder generally comprises two basic parts: a demultiplexing portion, the main function of which consists of unpacking a serial bit stream of encoded data, which in this case is in the frequency-domain; and time-domain signal processing, which converts the demultiplexed signal back to the time-domain. A mufti-channel output section may be provided to cater to a multiple output format. If the number of channels required at the decoder output is smaller than the number of channels which are encoded in the bit stream, then downmixing is required. Downmixing in the time-domain is usually provided in present decoders. However, since the inverse frequency-domain transform is a linear operation, it is also possible to downmix in the frequency-domain prior to transformation.
The encoded data representing the audio signals may convey from one to multiple full bandwidth channels, along with a low frequency channel. The encoded data is organized into synchronization frames. The way in which the demultiplexing and time-domain signal processing portions are related is a function of the information available in a synchronization frame. Each frame contains several coded audio blocks, each of which represents a series of audio samples. Further, each frame contains a synchronization information header to facilitate synchronization of the decoder, bit stream information for informing the decoder about the transmission mode and options, and an auxiliary data field which may include user data or dummy data. For example for an AC-3 audio decoder from Dolby Laboratories of San Francisco, Calif., the data field is adjusted by the encoder such that the cyclic redundancy check element falls on the last word of the frame. The cyclic redundancy check word is checked after more than half of the frame has been received. Another cyclic redundancy check word is checked after the complete frame has been received, such as described in Advance Television Systems Committee, Digital Audio Compression Standard (AC-3), 20 Dec. 1995. Another example is the MPEG-1 standard audio decoder where the cyclic redundancy check-word is optional for normal operation. However, if the MPEG-2 extension is required, then there is a compulsory cyclic redundancy check-word.
An audio block also contains information relating to splitting of the block into two or more sub-blocks during the transformation from the time-domain to the frequency-domain. A long block length allows the use of a long transform length, which is more suitable for input signals whose spectrum remains stationary or quasi-stationary. This provides a greater frequency resolution, improved coding performance and a reduction of computing power required. Two or more short length transforms, utilized for short block lengths, enable greater time resolution, and are more desirable for signals whose spectrum changes rapidly with time. The computer power required for two or more short transforms is ordinarily higher than if only one transformation is required. This approach is very similar to behavior known to occur in human hearing.
Again as an example, in the Dolby AC-3 audio decoder mentioned above, dither, dynamic range, coupling function, channel exponents, bit allocation function, gain, channel mantissas and other parameters are also contained in each block. However, they are represented in a compressed format, and therefore unpacking, setting-up tables, decoding, expansion, calculations and computations must be performed before the pulse coded modulation (PCM) audio samples can be recognised.
The input bit stream for a decoder will typically come from a transmission (such as HDTV, CTV) or a storage system (e.g. CD, DAT, DVD). Such data can be transmitted in a continuous way or in a burst fashion. The demultiplexing and bit decoding portion of the decoder synchronises the frame and stores up to more than half of the data before the start of processing. The synchronisation word and bit stream information are unpacked only once per frame. The audio blocks are unpacked one by one and at this stage each block containing the new audio samples may not have the same length (i.e. the number of bits in each block may differ). However, once the audio blocks are decoded, each audio block will have the same length. The first audio block contains not only new PCM audio samples but also extra information which concerns the complete frame. The rest of the audio blocks may contain a smaller number of bits. The bit decoding section performs an unpacking and decoding function, the final product of which will be the frequency transform coefficients of each channel involved, in a floating-point format (exponents and mantissas) or fixed-point format.
The time-domain signal processing (TDSP) section first receives the transform coefficients one block at a time. In normal operation, when the signals spectra are relatively stationary in nature and have been frequency-domain transformed using a long transform length, a block-switch flag is disabled. The TDSP uses a 2N-point inverse fast Fourier transform (IFFT) of corresponding long length to obtain N time-domain samples. When fast changing signals are considered, the block-switch flag is enabled and signals are frequency-domain transformed differently, though the same number of coefficients, N, are also transmitted. Then, a short length inverse transform is used by the TDSP.
Where the audio decoder receives M channel inputs (M an integer), and produces P output channels, where M>P and P>0, the audio decoder must provide M frequency-domain transformations. Since only P output channels are required, a downmixing process is then performed. The number of channel is downmixed from M to P: