A variety of techniques exist for digitally encoding audio or speech signals using bit rates considerably lower than those required for pulse-code modulation (PCM). In sub-band coding (SBC), a filter bank divides the frequency band of the audio signal into a plurality of sub bands. In sub-band coding, the signal is not formed into frames along the time axis prior to coding. In transform encoding, a frame of digital signals representing the audio signal on the time axis is converted by an orthogonal transform into a block of spectral coefficients representing the audio signal on the frequency axis.
In a combination of sub-band coding and transform coding, digital signals representing the audio signal are divided into a plurality of frequency ranges by sub-band coding, and transform coding is independently applied to each of the frequency ranges.
Known filters for dividing a frequency spectrum into a plurality of frequency ranges include the Quadrature Mirror Filter (QMF), as discussed in, for example, R. E. Crochiere, Digital Coding of Speech in Subbands, 55 BELL SYST. TECH. J., No. 8, (1976). The technique of dividing a frequency spectrum into equal-width frequency ranges is discussed in Joseph H. Rothweiler, Polyphase Quadrature Filters--A New Subband Coding Technique, ICASSP 83 BOSTON.
Known techniques for orthogonal transform include the technique of dividing the digital input audio signal into frames of a predetermined time duration, and processing the resulting frames using a Fast Fourier Transform (FFT), discrete cosine transform (DCT) or modified DCT (MDCT) to convert the signals from the time axis to the frequency axis. Discussion of a MDCT may be found in J. P. Princen and A. B. Bradley, Subband/Transform Coding Using Filter Bank Based on Time Domain Aliasing Cancellation, ICASSP 1987.
In a technique of quantizing the spectral coefficients resulting from an orthogonal transform, it is known to use sub bands that take advantage of the psychoacoustic characteristics of the human auditory system. In this, spectral coefficients representing an audio signal on the frequency axis may be divided into a plurality of critical frequency bands. The width of the critical bands increase with increasing frequency. Normally, about 25 critical bands are used to cover the audio frequency spectrum of 0 Hz to 20 kHz. In such a quantizing system, bits are adaptively allocated among the various critical bands. For example, when applying adaptive bit allocation to the spectral coefficient data resulting from a MDCT, the spectral coefficient data generated by the MDCT within each of the critical bands is quantized using an adaptively-allocated number of bits.
Known adaptive bit allocation techniques include that described in IEEE TRANS. ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VoL. ASSP-25, No. 4 (1977, August) in which bit allocation is carried out on the basis of the amplitude of the signal in each critical band. This technique produces a flat quantization noise spectrum and minimizes noise energy, but the noise level perceived by the listener is not optimum because the technique does not effectively exploit the psychoacoustic masking effect.
In the bit allocation technique described in M. A. Krassner, The Critical Band Encoder--Digital Encoding of the Perceptual Requirements of the Auditory System, ICASSP 1980, the psychoacoustic masking mechanism is used to determine a fixed bit allocation that produces the necessary signal-to-noise ratio for each critical band. However, if the signal-to-noise ratio of such a system is measured using a strongly tonal signal, for example, a 1 kHz sine wave, non-optimum results are obtained because of the fixed allocation of bits among the critical bands.
It is also known that, to optimize the perceived noise level using the amplitude-based bit allocation technique discussed above, the spectrum of the quantizing noise can be adapted to the human auditory sense by using a fixed noise shaping factor. Bit allocation is carried out in accordance with the following formula: EQU b(k)=.delta.+1/2 log.sub.2 [.sigma..sup.2(1+.gamma.) (k)/D](1)
where b(k) is the word length of the quantized spectral coefficients in the k'th critical band, .delta. is an optimum bias, .sigma..sup.2 (k) is the signal power in the k'th critical band, D is the mean quantization error power over all the entire frequency spectrum, and .gamma. is the noise shaping factor. To find the optimum value of b(k) for each critical band, the value of .delta. is changed so that the sum of the b(k)s for all the critical bands is equal to, or just less than, the total number of bits available for quantization.
This technique does not allow bits to be concentrated sufficiently within a single critical band, so unsatisfactory results are obtained when the signal-to-noise ratio is measured using a high tonality signal, such as a 1 kHz sine wave.