In order to transmit audio signals such as speech or music via digital transmission systems, the signals must first be digitised. That is to say, the audio signal must be represented in digital form. A simple form of digital representation is Pulse Code Modulation (PCM). In PCM the amplitude of an audio signal is sampled at discrete time intervals, and each amplitude sample is represented as a digital word. However, since a digital word can only represent discrete levels, for example 32 levels for a 5 bit digital word, each amplitude sample is quantised to one of these 32 levels. This results in there being a difference between the sampled signal and the actual digital sample values. The difference is known as the quantisation error since it arises out of the quantisation process.
The minimum rate at which a signal needs to be sampled in order to be correctly represented is twice the frequency of the highest frequency component in the signal. This is known as the Nyquist rate. For human audio applications the Nyquist rate is typically 20-24 KHz.
To achieve acceptable quantisation noise levels for typical human audio a 700 kbps data rate is conventionally used. Such a data rate requires wide band transmission channels, which are expensive or hard to obtain. This is a particular problem in radio or wireless communication channels where the bandwidth of communication channels are a trade off between data rate requirements, available spectrum and compatibility with Integrated Digital Services Networks (ISDN) or other land line communication system. Typically, the available data rate is 64 kbps. Additionally, wire or cable links comprising both audio and video channels may have limited available bandwidth, in order to accommodate all the channels.
Since the storage and transmission of high quality audio data can be technically or economically prohibitive in many applications, particularly consumer applications, and existing communication channels such as for ISDN are limited to low bit rates (64 kbps), efficient bit rate reduction techniques are necessary. Bit rate reduction is achieved by compressing the signal in some manner.
There are two basic principles of signal compression: removing the statistical or deterministic redundancies in the source signal; and matching the quantising system (PCM) to the properties of human perception. In compressing audio signals, redundancy in the signal is reduced as much as possible using prediction and transform coding techniques. Perceptual coding (noise shaping) techniques, based on human audio perception are also used to reduce redundancy.
During the last few years, the approach most suited for achieving the required data compression for high quality audio applications has utilised the masking properties of the human auditory system. This approach uses filterbanks or transform coding to separate audio signals into frequency bands (sub-bands). Each sub-band is analysed and data irrelevancy is removed from acoustic signals without any noticeable effect to the listener. The masking properties are psychoacoustical in that the masking mechanism occurs in the inner ear and results in noise components being inaudible provided that they coexist with other components of stronger amplitude. Audio coders utilise this phenomenon and shape quantisation noise components to be below a masking threshold of the signal. The ISO (International Standards Organisation) MPEG (Moving Pictures Expert Group) audio coding standard and other audio coding standard were developed based on the above principles.
For further reductions in data rate, e.g. down to 64 kbps, additional coding techniques are necessary. Some of such coding techniques are based on adaptive prediction. Adaptive prediction is based on using previous signal samples to predict what a current sample will be, and comparing the predicted value with the current sample value to determine a difference or error between them. The error signal is then transmitted together with coefficients, or without coefficients for backward prediction, representing the predicted signal, such that the sample can be reconstructed at a decoder. The number of bits that need to be transmitted using predictive coding is substantially less than required for the original signals. This gives what is known as a "coding gain". This is the reduction in transmitted signal power for coded signals compared to the transmitted signal power required for original signals.
It is known to use backward linear prediction techniques for decreasing the redundancy of audio signals. Mahieux et al, "Transform Coding of Audio Using Correlation Between Successive Transform Blocks" Proc ICASSP '89 pp 2021-2024 describes using a fixed linear predictor to remove inter-frame redundancy. Also, techniques have been described in which only audible differences between successive frames are encoded, Paraskevas et al, "A Differential Perceptual Audio Coding Method With Reduced Bitrate Requirements", IEEE Trans. on Speech and Audio Processing, vol. 3 No. 6 November 1995.
Due to the non-stationary nature of audio signals, particularly music audio, adaptive predictive coding techniques have been used. Fuchs et al, "Improving MPEG Audio Coding by Backward Adaptive Linear Stereo Prediction", AES convention, New York, Preprint No 40 86 October 1995, describes a lattice structured adaptive predictor using predictor switching of different orders applied to an MPEG audio codec. However, these methods had drawbacks and problems such as instability and slow convergence after switch on or recovery from transients. Additionally, side information needs to be transmitted to indicate which predictor order is in use. The level of side information transmitted depends on the number of predictors with different prediction orders, and the number of transmitted sub-bands. Fuchs et al used seven predictors requiring four bits of side information. For 20 sub-bands having non-zero bit allocation this results in 80 bits per frame or 10 kbit/s for MPEG-1 Layer 1 and 3.3 kbit/s for MPEG-1 Layer II. Such bit rates are negligible for a high bit rate audio codec, but have a severe impact on low bit rate codecs.