The invention relates in general to high-quality low bit-rate encoding and decoding of signals carrying information intended for human perception such as audio signals, and more particularly music signals.
There is considerable interest among those in the field of signal processing to discover methods which minimize the amount of information required to represent adequately a given signal. By reducing required information, signals may be transmitted over communication channels with lower bandwidth, or stored in less space. With respect to digital techniques, minimal informational requirements are synonymous with minimal binary bit requirements.
Two factors limit the reduction of bit requirements:
(1) A signal of bandwidth W may be accurately represented by a series of samples taken at a frequency no less than 2.multidot.W. This is the Nyquist sampling rate. Therefore, a signal T seconds in length with a bandwidth W requires at least 2.multidot.W.multidot.T number of samples for accurate representation. PA1 (2) Quantization of signal samples which may assume any of a continuous range of values introduces inaccuracies in the representation of the signal which are proportional to the quantizing step size or resolution. These inaccuracies are called quantization errors. These errors are inversely proportional to the number of bits available to represent the signal sample quantization. PA1 (a) The recovered signal interval or block may be multiplied by an inverse window, one whose weighting factors are the reciprocal of those for the analysis window. A disadvantage of this technique is that it clearly requires that the analysis window not go to zero at the edges. PA1 (b) Consecutive input signal blocks may be overlapped. By carefully designing the analysis window such that two adjacent windows add to unity across the overlap, the effects of the window will be exactly compensated. (But see the following paragraph.) When used with certain types of transforms such as the Discrete Fourier Transform (DFT), this technique increases the number of bits required to represent the signal since the portion of the signal in the overlap interval must be transformed and transmitted twice. For these types of transforms, it is desirable to design the window with an overlap interval as small as possible. PA1 (c) Signal synthesis or decoding performed in a decoder may also require synthesis filtering. As discussed in Crochiere, "A Weighted Overlap-Add Method of Short-Time Fourier Analysis/Synthesis," IEEE Trans. Acoust., Speech, and Signal Proc., vol. ASSP-28, February, 1980, pp. 99-102, synthesis interpolation filtering can be implemented more efficiently by a synthesis-window weighted overlap-add method. Thus, some subband coders implemented with transforms, including one used in an embodiment discussed in more detail below, use synthesis windowing with overlap-add. Further, quantizing errors may cause the inverse transform to produce a time-domain signal which does not go to zero at the edges of the finite time interval. Left alone, these errors may distort the recovered time-domain signal most strongly within the window overlap interval. A synthesis window can be used to shape each synthesized signal block at its edges. In this case, the signal will be subjected to an analysis and a synthesis window, i.e., the signal will be weighted by the product of the two windows. Therefore, both windows must be designed such that the product of the two will sum to unity across the overlap. See the discussion in the previous paragraph. PA1 (1) Prediction: a periodic or predictable characteristic of a signal permits a receiver to anticipate some component based upon current or previous signal characteristics. PA1 (2) Entropy coding: components with a high probability of occurrence may be represented by abbreviated codes. Both the transmitter and receiver must have the same code book. Entropy coding and prediction have the disadvantages that they increase computational complexity and processing delay. Also, they inherently provide a variable rate output, thus requiring buffering if used in a constant bit-rate system. PA1 (3) Nonuniform coding: representations by logarithms or nonuniform quantizing steps allow coding of large signal values with fewer bits at the expense of greater quantizing errors. PA1 (4) Floating point: floating-point representation may reduce bit requirements at the expense of lost precision. Block-floating-point representation uses one scale factor or exponent for a block of floating-point mantissas, and is commonly used in coding time-domain signals. Floating point is a special case of nonuniform coding. PA1 (5) Bit allocation: the receiver's demand for accuracy may vary with time, signal content, strength, or frequency. For example, lower frequency components of speech are usually more important for comprehension and speaker recognition, and therefore should be transmitted with greater accuracy than higher frequency components. Different criteria apply with respect to music signals. Some general bitallocation criteria are:
If coding techniques are applied to the full bandwidth, all quantizing errors, which manifest themselves as noise, are spread uniformly across the bandwidth. Split-band techniques which may be applied to selected portions of the spectrum can limit the spectral spread of quantizing noise. Two known split-band techniques, subband coding and transform coding, are discussed in Tribolet and Crochiere, "Frequency Domain Coding of Speech," IEEE Trans. on Acoust., Speech, Signal Proc., vol. ASSP-27, October, 1979, pp. 512-30. By using subband coding or transform coding, quantizing errors can be reduced in particular frequency bands where quantizing noise is especially objectionable by quantizing that band with a smaller step size.
Subband coding may be implemented by a bank of digital bandpass filters. Transform coding may be implemented by any of several time-domain to frequency-domain transforms which simulate a bank of digital bandpass filters. Although transforms are easier to implement and require less computational power and hardware than digital filters, they have less design flexibility in the sense that each bandpass filter "frequency bin" represented by a transform coefficient has a uniform bandwidth. By contrast, a bank of digital bandpass filters can be designed to have different subband bandwidths. Transform coefficients can, however, be grouped together to define "subbands" having bandwidths which are multiples of a single transform coefficient bandwidth. The term "subband" is used hereinafter to refer to selected portions of the total signal bandwidth, whether implemented by a subband coder or a transform coder. The term is used in this manner because, as discussed by Tribolet and Crochiere, the mathematical basis of subband coders and transform coders are interchangeable, theretore the two coding Inethods are potentially capable of duplicating each other. A subband as implemented by transform coder is defined by a set of one or more adjacent transform coefficients or frequency bins. The bandwidth of a transform coder frequency bin depends upon the coder's sampling rate and the number of samples in each signal sample block (the transform length).
Tribolet and Crochiere observed that two characteristics of subband bandpass filters are particularly critical to the performance of subband coder systems because they affect the amount of signal leakage between subbands. The first is the bandwidth of the regions between the filter passband and stopbands (the transition bands). The second is the attenuation level in the stopbands. As used herein, the measure of filter "selectivity" is the steepness of the filter response curve within the transition bands (steepness of transition band rolloff), and the level of attenuation in the stopbands (depth of stopband rejection).
It is known from Tribolet and Crochiere that reducing leakage between subbands is important to subband coder performance because such leakage distorts the results of spectral analysis, and therefore adversely affects coding decisions made in response to the derived spectral shape. Such leakage can also cause frequency-domain aliasing. These effects are discussed in more detail below.
The two filter characteristics, steepness of transition band rolloff and depth of stopband rejection, are also critical because the human auditory system displays frequency-analysis properties resembling those of highly asymmetrical tuned filters having variable center frequencies. The ability of the human auditory system to detect distinct tones generally increases as the difference in frequency between the tones increases; however, the frequency resolution of the human auditory system remains substantially constant for frequency differences less than the bandwidth of the above mentioned filters. The effective bandwidth of these filters, which is referred to as a critical band, varies throughout the audio spectrum. A dominant signal within a critical band is more likely to mask or render inaudible other signals anywhere within that critical band than other signals at frequencies outside that critical band. A dominant signal may mask other signals which occur not only at the same time as the masking signal, but also which occur before and after the masking signal. The duration of pre- and post-masking effects within a critical band depend upon the magnitude of the masking signal, but pre-masking effects are usually of much shorter duration than post-masking effects. See generally, the Audio Engineering Handbook , K. Blair Benson ed., McGraw-Hill, San Francisco, 1988, pages 1.40-1.42 and 4.8-4.10.
Psychoacoustic masking is more easily accomplished by subband and transform coders if the subband bandwidth throughout the audible spectrum is less than the critical bandwidth of the human auditory system in the same portions of the spectrum. This is because the critical bands of the human auditory system have variable center frequencies that adapt to auditory stimuli, whereas subband and transform coders typically have fixed subband center frequencies. To optimize the opportunity to utilize psychoacoustic-masking effects, any distortion artifacts resulting from the presence of a dominant signal should be limited to the subband containing the dominant signal. If the subband bandwidth is about half or less than half of the critical band (and if the transition band rolloff is sufficiently steep and the stopband rejection is sufficiently deep), the most effective masking of the undesired distortion products is likely to occur even for signals whose frequency is near the edge of the subband passband bandwidth. If the subband bandwidth is more than half a critical band, there is the possibility that the dominant signal will cause the human auditory system's critical band to be offset from the coder's subband so that some of the undesired distortion products outside the critical bandwidth are not masked. These effects are most objectionable at low frequencies where the critical band is narrower.
Transform coding performance depends upon several factors, including the signal sample block length, transform coding errors, and aliasing cancellation.