1. Field of the Invention
The present invention relates generally to the field of encoding and decoding audio information and particularly to the encoders and decoders employing the MPEG standard for audio information.
2. Description of the Prior Art
In modern communication systems there is an increasing demand for transfer and dissemination of greater quantities of information at faster speeds. In order to transfer greater quantities of information at ever increasing speeds without sacrificing accuracy, data compression is performed at the point of origination and data system. Compression and decompression result in a simpler format for the information to be transmitted thereby increasing the speed and efficiency of the transmission process.
Data compression is effected by employing a variety of encoding techniques presently available. Each of the encoding techniques results in a specific format for the compressed data. When the encoded information is transferred to the destination point, data decompression is performed by decoding the transmitted data in order to retrieve the original information. The process of encoding and decoding must be fast enough to allow for real-time presentation of data in such cases as in the transmission of audio and video information.
Digital audio is a basic component of any video or multimedia application. Due to the large bandwidth occupied by digital audio in any such application, compression of the audio data is an important part of the encoding process. Audio compression is generally performed by taking into consideration the characteristics of the audio signal and the human perception system as embodied in a psychoacoustic model. There are two main high-fidelity audio compression techniques: the Motion Picture Expert Group (MPEG) audio standard and the Dolby Digital audio compression algorithms developed by the Dolby Laboratories.
FIG. 1(a) shows a block diagram of an MPEG encoder for a single audio channel. In multichannel systems the same process is repeated for each channel. The audio input 12 consisting of pulse code modulated (PCM) samples, each having a precision of 16 to 24 bits, is shown to constitute the input to the encoder 10. The PCM samples are sampled at 32, 44.1 or 48 KHz frequency. The first stage of the encoder 10 is the analysis filterbank 14 which maps the input signal from the time domain into the frequency domain. The analysis filterbank 14 consists of 32 band-pass filters each of which is a 512-tap band-pass filter.
In addition, based on the frequency characteristics of the input signal and the desired bit rate of the compressed signal, the perceptual model 20 estimates the masking thresholds. Masking threshold is a sound pressure level below which the human ear is less sensitive so that any noise or distortion introduced by the encoder becomes almost imperceptible. For example, in the frequency domain a faint signal may be completely masked if it is in the vicinity of louder signals with similar frequency content. The masking thresholds are used in the quantization and coding step 16 as described hereinbelow.
The output of each subband filter is normalized by the scaling factors that will be transmitted as part of the compressed bitstream. Scaling factors correspond to the maximum absolute value of every twelve consecutive output values in each subband. The output of the analysis filterbank 14 is quantized in the quantization and coding step 16 in such a way that all quantization noise is below the masking thresholds thereby being almost imperceptible to the human ear. Finally, the quantized subband samples, the scaling factors and the bit-allocation information are multiplexed in the bitstream encoding step 18 and transmitted as the compressed stream output 22.
FIG. 1(b) shows a block diagram of an MPEG decoder 30 used in recovering the PCM audio samples from the encoded data. The encoded bitstream 24 is shown in FIG. 1(b) as input to the decoder 30. At the step frame unpacking 26 of decoding the encoded bitstream 24 is parsed and various pieces of coding information such as scaling factors and bit allocation information are demultiplexed. Subsequently, at the reconstruction step 28 the bit allocation information is decoded and the scaling factors are extracted. The bit allocation information is decoded and the scaling factors are used to requantize the coded samples. Finally, at the step inverse mapping 34 the mapped samples are transformed back into the PCM output 32 corresponding to the input signal of the encoder 10.
Some of the steps used in the encoding process are computationally intensive. For example, the analysis filterbank step 14 and the perceptual model step 20 in the encoder flowchart 10 require intensive computations commonly performed by a fixed-point digital signal processor (DSP). Performing intensive computations requires considerable amount of time severely limiting the performance of the encoder during real-time transmission of audio signals.
One of the quantities to be computed in the perceptual model step 20 is the masking threshold as discussed hereinabove. According to the MPEG audio coding standard ISO/IEC 11172-3, xe2x80x9ccoding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbits/sxe2x80x94part 3: Audio,xe2x80x9d ISO/IEC JTC 1/SC29, May 20, 1993, hereinafter referred to as the MPEG Standard, calculating masking threshold entails evaluating such trigonometric function as sine, cosine and inverse tangent which represents a computationally intensive task for a DSP. Evaluating such trigonometric function is needed in computing the unpredictability measure, which is in turn used in determining the masking threshold as described in detail in the MPEG Standard.
Another difficulty currently encountered in the perceptual model step 20 lies in the huge dynamic range of the input data. The MPEG Standard calls for a coverage of about 101 dB (xe2x88x925 dB to 96 dB) in dynamic range. Every bit covers 3 dB so that the MPEG Standard requires 34 or more bits of digital representation. However, most fixed-point DSP chips for audio are 16 or 24 bits in data width. Although floating-point DSP chips can accommodate higher data widths, fixed-point DSP chips are by far more prevalent due to their smaller size and lower cost. According, the input data has to be scaled in order to fall within the dynamic range of the DSP architecture.
Scaling factors are used to scale down the large input signals in order to avoid clipping. i.e., cutting off an input signal whose sound energy level extends beyond the dynamic range of the DSP. Once the input data has been scaled down, a particular table in the MPEG Standard is used to determine the absolute threshold value used in computing the masking threshold. However, as the input data is consistently scaled down, too few bits may be assigned to represent the weak signal resulting in the problem of underflow, i.e., losing some of the information carried in the weaker signals.
Moreover, there are limitations currently associated with the decoder 30 in FIG. 1(b). One such limitation is in the reconstruction step 28 of the decoding process wherein the coded samples have to be requantized so that a specific number of bits are allocated to each coded sample. Requantization is performed by determining the requantization step from a set of four 16 by 32 tables provided in the MPEG Standard. The four different tables correspond to four different bit rates and sampling frequencies. To each entry in the tables corresponds a set of four number. One of the numbers indicates the number of bits per sample and the rest of the numbers are used in the subsequent inverse mapping step 34. Thus the total number of entries stored in the memory of the decoder corresponds to four 16 by 32 by 4 tables. Thus, considerable memory space has to be devoted to the reconstruction step of the decoding process rendering the decoder less efficient and more expensive.
In light of the above, it is desirable to improve upon the MPEG encoder/decoder by making the various steps in the encoding and decoding process more efficient without sacrificing audio quality. The present invention improves upon various steps in the compression/decompression process by providing more efficient approaches while preserving the audio quality.
Briefly, a communication system includes an encoder circuit responsive to an audio signal for performing compression on the audio signal and adaptive to generate an audio output signal based upon the compressed audio signal, the encoder circuit for sampling the audio signal to generated sampled signals, each sampled signals having a real and an imaginary component associated therewith, each sampled signal having an energy and a phase defined within a current block and each sampled signal being transformed to have a real and an imaginary component, a previous block preceding the current block and a block preceding the previous block, the encoder circuit for calculating the phase of the samples of the current block using the real and the imaginary components of the samples of the previous block and the block preceding the previous block, wherein calculations for determining the unpredictability measure is reduced by avoiding trigonometric calculations of the sampled signals of the current block thereby improving system performance.