Recent years have witnessed an unprecedented increase in the use of psycho-acoustic models for the design of audio coders. This has led to high compression ratios while keeping audible degradation in the compressed signal to a minimum. Description of one such method, which is the centre of current discussion, can be found in the ATSC Standard, “Digital Audio Compression (AC-3) Standard”, Document A/52, 20 Dec., 1995.
In the AC-3 encoder the input time domain signal is sectioned into frames, each frame comprising of six audio blocks. Since AC-3 is a transform coder, the time domain signal in each block is converted to the frequency domain using a bank of filters. The frequency domain coefficients, thus generated, are next converted to fixed point representation. In fixed point syntax, each coefficient is represented as a mantissa and an exponent. The bulk of the compressed bitstream transmitted to the decoder comprises these exponents and mantissas.
Each mantissa must be truncated to a fixed or variable number of decimal places. The number of bits to be used for coding each mantissa is to be obtained from a bit allocation algorithm which may be based on the masking property of the human auditory system. Lower number of bits result in higher compression ratio because less space is required to transmit the coefficients. However, this may cause high quantization error leading to audible distortion. A good distribution of available bits to each mantissa forms the core of the advanced audio coders.
Further compression can be successfully obtained in AC-3 by use of a technique called coupling. Coupling takes advantage of the way the human ear determines directionality for very high frequency signals, in order to allow a reduction in the amount of data necessary to code audio signals. At high audio frequency (approximately above 2 KHz.), the human ear is physically unable to detect individual cycles of an audio waveform, and instead responds to the envelope of the waveform. Consequently, the coder combines the high frequency coefficients of the individual channels to form a common coupling channel. The original channels combined to form the said coupling channel are referred to as coupled channels.
The translation of the AC-3 Encoder Standard on to the firmware of a DSP-Core involves several phases. Firstly, the essential compression algorithm blocks for the AC-3 Encoder have to be designed. After individual blocks are completed, they are integrated into an encoding system which receives a PCM (pulse code modulated) stream, processes the signal applying signal processing techniques such as transient detection, frequency transformation, psychoacoustic analysis (coupling & bit-allocation), and produces a compressed stream in the format of the AC-3 Standard.
The coded stream should be capable of being decompressed by any standard AC-3 Decoder and the PCM stream generated thereby should be comparable in audio quality to the original music stream. If the original stream and the decompressed stream are indistinguishable in audible quality (at reasonable level of compression) the development moves to the third phase. If the quality is not transparent (indistinguishable), further algorithm development and improvements continue.
In the third phase the algorithms are implemented using the word-length specifications of the target DSP-Core. Most commercial DSP-Cores allow only fixed point arithmetic (since floating point engine is costly in terms of area). Consequently the algorithm is translated to a fixed point solution. The word-length used is usually dictated by the ALU (arithmetic-logic unit) capabilities and bus-width of the target core. For example AC-3 Encoder on Motorala's 56000 would use 24-bit precision since it is a 24-bit Core. Similarly, for implementation on Zoran's ZR38000 which has 20-bit data path, 20-bit precision would be used [4].
If, for example, 20-bit precision is discovered to provide unacceptable level of sound quality, the provision to use double precision always exist. In this case each piece of data is stored and processed as two segments, lower and upper words, each of 20-bit length. The accuracy of implementation is doubled but so is the computational complexity—double precision multiplication could require 6 or more cycles while single precision multiplication and addition (MAC) requires only a single cycle).
Twenty four bit AC-3 Encoders are known to provide sufficient quality. However 16-bit single precision AC-3 Encoder quality is viewed as terribly poor. Consequently few or no attempts (at least not published) to use 16-bit Core for AC-3 Encoder has been made to date.
Coupling is one of the most difficult and tricky algorithm to implement on a fixed-point processor and it becomes even more so when attempted on a 16-bit processor. It can be quite computationally demanding and if not implemented intelligently can lower the accuracy of the represented signal, thereby effecting final quality of the reproduced (decoded) signal.
Single precision 16-bit implementation of AC-3 Encoder is generally considered unacceptable in quality and such a product would be at a distinct disadvantage in the consumer market. Double precision implementation is too computationally costly. It has been estimated that such an implementation would require over 120 MIPS (million instruction per second). This exceeds what most commercial DSPs can provide (moreover, extra MIPS are always needed for system software and value-added features). One of the most difficult section of AC-3 for a 16-bit processor is the Coupling. So the question is: is it possible to implement high quality AC-3 Encoder Coupling on a 16-bit DSP with reasonable computational requirement ?