The present invention relates to the mixing and encoding of audio signals and, in particular, to the mixing and encoding of AC-3 signals.
Recently, the capture, transmission and processing of digital audio has become increasingly popular. Often, in order to save bandwidth and storage space, the signals are transmitted in a compressed form. One extremely popular form of audio compression is the Dolby AC-3(trademark) transmission format and the MPEG2xe2x80x94level 3 transmission format.
Extensive discussion of the technical aspects of the Dolby transmission format can be found at the Dolby website. In particular, reference is made to:
Steve Vernon, xe2x80x9cDesign and implementation of AC-3 coders,xe2x80x9d IEEE Trans. Consumer Electronics, Vol.41, No.3 August 1995.
Mark F. Davis, xe2x80x9cThe AC-3 Multichannel Coder,xe2x80x9d Presented at the 95th Convention of the Audio Engineering Society, Oct. 7-10, 1993.
Craig C. Todd, Grant A. Davidson, Mark F. Davis, xe2x80x9cAC-3: Flexible Perceptual Coding for Audio Transmission,xe2x80x9d Presented at the 96th convention of the Audio Engineer society, Feb. 26-Mar. 01, 1994.
Turning to FIG. 1 there is illustrated the standard AC-3 encoding process taken from one of the aforementioned references. In the AC-3 process 1, input samples are firstly frequency domain transformed 2 utilizing a modified discrete cosine transform with a fifty percent overlap. The output is then forwarded to a floating point conversion process which divides the transform coefficients into exponent and mantissa pairs. The mantissas are then quantised 5 with a variable number of bits based on a parametric bit allocation model 6. The exponents and mantissas are packed into a bit stream 7 before being output 8 in an AC-3 format. In a decoding process, the steps are provided in reverse so as to produce output samples.
When it is desired to mix multiple signals together so as to create new output audio signals, the lengthy process of decoding must be undertaken each time with the signals transformed into the time domain and then transformed back into the frequency domain.
It will be desirable to provide a system having lower levels of computational requirements when mixing signals whilst maintaining significant advantages in efficiency of utilisation.
In accordance with the first aspect of the present invention, there is provided a method of creating an audio output signal from a series of input audio signals, comprising: (a) for each of said series of input audio signals, precomputing corresponding transform domain input audio signals and associated psychoacoustic masking curves for said input audio signals; (b) mixing together said transform domain input signals in the transform domain to produce an output transform domain signal; (c) mixing together said masking curves in the transform domain to produce an output transform domain masking curve; (d) quantizing said output transform domain signal with said output transform domain masking curve; and (e) outputting said quantized output transform domain signal.
Element (b) can include, wherein said mixing together said transform domain input signals includes fading one or more of said transform domain input signals, wherein said fading includes suppressing noise associated with said fading process. The suppressing preferably can include a first order compensation for said noise.
The system can also include transforming in real-time a real-time audio stream and mixing said real-time audio stream with said transform domain input signals in element (b).
The quantized output transform domain signal can be in the format of AC3 encoded data or MPEG audio encoded data.
The audio output signal is created as a series of blocks of data output one at a time and the method preferably can include adaptively determining compression parameters for the output blocks.
In accordance with a further aspect of the present invention, there is provided a method of creating a compressed audio output signal from a series of input audio signals comprising, for each of said input audio signals: a) precomputing a transform corresponding to the desired compression format of said in output audio signal; b) precomputing ancillary information relating to the compression of the transformed input audio; c) mixing together said transformed input signals in the transform domain to produce an output transform domain signal; d) algorithmically combining together said precomputed ancillary information to determine a suitable decompression strategy; and e) outputting compressed audio data comprising said output transform domain signal and said combined ancillary information.
The ancillary information comprises at least one of the following: signal banded power spectrum, exponent groupings or psycho acoustic masking curves. The element (d) preferably can include determining desirable quantization levels of said output transform domain signal.
In accordance with a further aspect of the present invention, there is provided a method of creating a compressed audio output signal from a series of input audio signals comprising, for each of said input audio signals: a) mixing together a series of transformed input signals in the transform domain to produce an output transform domain signal; b) algorithmically combining together precomputed ancillary information to determine a suitable decompression strategy; and c) outputting compressed audio data comprising said output transform domain signal and said combined ancillary information.
In accordance with a further aspect of the present invention, there is provided a single pass AC3 encoder having adaptive processing capabilities. The single pass encoder can efficiently produce an AC3 output in real time. This is to be contrasted with an iterative encoder. The single pass encoder calls upon information from different sources. For example:
Efficiency of previous blocks compression
Precomputed bit allocation information and suggested exponent strategies.
Precomputed audio signal statistics to determine when to change strategies.
Simple real time audio signal statistics to identify change points
Precomputed information from future audio blocks to estimate future bit allocation demand.
Using this information an algorithm which can estimate the strategies and masking curve parameters for the audio data which come close to using all the available bandwidth without exceeding it is provided. Preferably the system provides for balancing the bit allocation load across the 6 audio blocks in a frame and different ways to immediately sacrifice some bit allocations to ensure the available bandwidth is not exceeded.