Not Applicable
1. Field of the Invention
This invention relates to multi-channel digital audio decoders for digital storage media and transmission media.
2. Description of the Related Art
As efficient multi-channel digital audio signal coding methods have been developed for storage or transmission applications such as the digital video disc (DVD) player and the high definition digital TV receiver (set-top-box). A description of one such method can be found in the ATSC Standard, xe2x80x9cDigital Sudio Compression (AC-3) Standardxe2x80x9d, Document A/52, 20 Dec. 1995. The standard defines a coding method for up to six channels of multi-channel audio, that is, left, right, centre, surround left, surround right, and the low frequency effects (LFE) channel. Techniques of this type can be applied in general to code any number of channels of related or even unrelated audio data into single or multiple representations (bitstreams).
In the ATSC(AC-3) method, the input multi-channel digital audio source is compressed block by block at the encoder by first transforming each block of time domain audio samples into frequency coefficients using an analysis filter bank, then quantizing the resulting frequency coefficients into quantized coefficients with a determined bit allocation strategy, and finally formatting and packing the quanitzed coefficients and bit allocation information into a bitstream for storage or transmission.
Furthermore, depending upon the spectral and temporal characteristics of each channel in the audio source, the transformation of each audio channel block may be performed adaptively at the encoder to optimize the frequency/time resolution. This is achieved by adaptive switching between two transformations with long transform block length or shorter transform block length. The long transform block length which has good frequency resolution is used for improved coding performance, and the shorter transform block length which has greater time resolution is used for audio input signals which change rapidly in time.
At the decoder, each audio block is decompressed from the bitstreams by first determining the bit allocation information, then unpacking and de-quantizing the quantized coefficients, and inverse transforming the resulting frequency coefficients based on determined long or shorter transform length to output time domain audio PCM data. The decoding processes are performed for each channel in the multi-channel audio data.
For reasons such as an overall system cost constraint or physical limitation such as the number of output loudspeakers that can be used, downmixing of the decoded multi-channel audio may be performed so that the number of output channels at the decoder is reduced. Basically, downmixing is performed such that the multi-channel audio information is fully or partially preserved while the number of output channel is reduced. For example, multi-channel coded audio bitstreams may be decoded and mixed down to two output channels, the left and right channel, suitable for conventional stereo audio amplifier and loudspeakers systems. One method of downmixing may be described as:       A    i    =            ∑              j        =        0            m        ⁢          (                        a          ij                xc3x97                  CH          j                    )      
where
i: the selected output audio channel number
j: input audio channel number
m: the total number of input audio channels
Ai: i-th output audio channel
CHj: j-th input audio channel
aij: downmixing coefficient for the i-th output and j-th input audio channel
The downmixing method or coefficients may be designed such that the original or the approximate of the original decoded multi-channel signals may be derived from the mixed down channels.
The complexity or cost of decoding for such current art multi-channel audio decoder is more or less proportional to the number of coded audio channels within the input bitstream. In particular, the inverse transform process, which is computationally the most intensive module of the audio decoder and incurs a much higher cost to implement compared to other processes within the audio decoder, is performed on every block of audio in every audio channel. For example, a six channel audio decoder would have about three times the complexity or cost of decoding compared to a stereo (two channel) audio decoder with the same decoding process for each audio channel.
It is an object of this invention to provide a method and apparatus for decoding a bitstream of transform coded multi-channel audio data which will overcome or at least ameliorate, the foregoing disadvantages of the prior art.
One factor that affects the complexity or implementation cost of the mentioned inverse transform is the arithmetic precision used within the process. The precision adopted in this module has a direct relation to the cost (in terms of the amount of RAM/ROM required) and complexity in implementation. Also, the inverse transform is the most demanding stage in terms of introduction of round off noise. Generally, the higher the precision used within the inverse transform process, the higher the implementation cost and the output quality; and vice versa, the lower the precision used within the inverse transform process, the lower the implementation cost and the output quality.
Arithmetic precision considerations in the Inverse Transform involve the word size of the frequency coefficients and the twiddle factors used in each stage, as well as the intermediate data retained between stages. The frequency coefficients generated by the data decoding stage are retained to the degree of accuracy defined by the precision required.
On the other hand, the audio channels represented within the multi-channel audio bitstream may have different perceptual importance relative to the actual audio contents. For examples, a surround effect channel may have relatively less perceptual importance compared to a main channel, or an audio block with shorter transform block length which has audio signals that change rapidly in time may have less frequency resolution requirement compared to an audio block with long transform block length.
By matching different precision for the inverse transform process within the multi-channel audio decoder with the audio contents within the coded multi-channel audio bitstream, the overall complexity or implementation cost of the decoder can be optimized.
According to a first aspect, this invention provides a method for decoding a bitstream of transform coded multi-channel audio data comprising the steps of:
(a) subjecting said bitstream to a block decoding process to obtain for each input audio channel within said multi-channel audio data a corresponding block of frequency coefficients;
(b) assigning to each said block of frequency coefficients a higher precision inverse transform or a lower precision inverse transform according to predetermined characteristics of said audio data represented by the block;
(c) subjecting each said block of frequency coefficients to higher precision inverse transform process of lower precision inverse transform process;
(d) generating a respective output audio signal in response to each said higher precision inverse transform process and each said lower precision inverse transform process.
In a second aspect, this invention provides an apparatus for decoding a bitstream of transform coded multi-channel audio data comprising:
(a) block decoding means to produce for each input audio channel within the said multi-channel audio data a corresponding block of frequency coefficients;
(b) means for assigning to each said block of frequency coefficients a higher precision inverse transform or a lower precision inverse transform according to predetermined characteristics of said audio data represented by the block;
(c) means for subjecting each said block of frequency coefficients according to said assigned higher precision inverse transform process or lower precision inverse transform process;
(d) means for generating a respective output audio signal in response to each said higher precision inverse transform process and lower precision inverse transform process.
Preferably, the blocks of frequency of all the input audio channels are downmixed in the frequency domain to a reduced number of intermediate blocks of frequency coefficients; and each intermediate block of frequency coefficient is assigned a higher precision inverse transform or a lower precision inverse transform according to predetermined characteristics of the audio data represented by the block.
Alternately, the blocks of frequency coefficients of all input audio channels coded adaptively with long or shorter transform block length can be downmixed partially in the frequency domain to a reduced number of intermediate blocks of frequency coefficients; and assigned a higher precision inverse transform or a lower precision inverse transform according to predetermined characteristics of the audio data represented by the block.
The block decoding preferably involves:
(a) parsing said bitstream to obtain bit allocation information of each input audio channel;
(b) unpacking quantized frequency coefficients from said bitstream using said bit allocation information;
(c) de-quantizing said quantized frequency coefficients to obtain said block of frequency coefficients using said bit allocation information.
Preferably, the higher precision inverse transform process applies a frequency-domain to time-domain transform to the respective block of frequency coefficients using higher precision arithmetic parameters and operations, and the lower precision inverse transform process applies a frequency-domain to time-domain transform to the respective block of frequency coefficients using lower precision arithmetic parameters and operations.
In an alternative, the higher precision inverse transform process applies subband synthesis filter bank to the respective block of frequency coefficients using higher precision arithmetic parameters and operations, and the lower precision inverse transform process applies subband synthesis filter bank to the respective block of frequency coefficients using lower precision arithmetic parameters and operations.
Preferably, the higher precision inverse transform uses a digital signal processor with double precision wordlength and the lower precision inverse transform uses the same digital signal processor with single precision wordlength. The digital signal processor is preferably a 16-bit processor.
In an embodiment of the present invention, the de-quantized frequency coefficients of each coded audio channel within a block, obtained by deformatting the input multi-channel audio bitstream, are subjected to selection means whereby the higher or lower precision inverse transform are determined for inverse transforming the de-quantized frequency coefficients of each coded audio channel within the block such that the decoding complexity is reduced without introducing significant artefacts in overall output audio quality.
Preferably, de-quantized coefficients of all coded audio channels can be mixed down in frequency domain such that the total number of inverse transform is reduced to the number of output audio channel required. The de-quantized frequency coefficients of the audio channel blocks which were coded adaptively with long or shorter transform block length can preferably be mixed down partially in the frequency domain according to the long and shorter transform block length needs so that the total number of inverse transform, higher and lower precision, is reduced to an intermediate number, and the final output audio channels are generated by combining the results of the inverse transform in time domain.
The means for assigning higher or lower precision inverse transform processes is preferably implemented in such a way that the decoding complexity is maintained while the output audio quality is improved. Parameters which may be used include number of coded audio channels, audio content information, long or shorter transform block switching information, output channel information, complexity required, and/or output audio quality required.
It will be apparent that with the addition of a relatively simple selector for higher or lower precision inverse transform, the overall complexity or implementation cost of the multi-channel audio decoder is reduced or optimized. An intelligent selector may be designed for multi-channel audio applications in such a way that perceptual importance of each audio channel is used to determine the precision of the inverse transform process, and maintains the overall subjective quality of the output audio channels. Simplification of the precision requirements for the inverse transform process for certain audio channels significantly benefits low cost multi-channel audio decoder implementations and applications.
Two embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings.