1. Field of the Invention
This invention relates to a method and a device for high efficiency compression and expansion of digital audio signals, that is, a method and a device for compressing the digital audio signal by high efficiency encoding, transmitting or recording the compressed signal on a recording medium and expanding the transmitted or recorded signal.
2. Description of the Related Art
There are a variety of methods known for high efficiency encoding of audio signals to effect compression. For example, sub-band coding (SBC) is known. Sub-band coding is a form of a non-time-block-forming frequency band dividing system in which an audio signal in the time domain is divided into plural frequency ranges in which the signal is encoded without being divided in time into blocks.
Among the prior-art techniques known to the present inventors, there are U.S. Pat. Nos. 4,972,484 and 5,109,417 which disclose a bit allocation method responsive to the input signal. However, in these prior-art methods, bit allocation is achieved in a manner dependent solely on the energy of the input signal.
There is also known a time-block-forming, frequency-band-dividing system, or a transform encoding system, in which the digital audio signal in the time domain is divided in time into blocks, and each block is orthogonally transformed to generate spectral components in the frequency domain. The spectral components resulting from transforming each block are divided into frequency bands in which they are encoded. There is also known a high efficiency encoding method which consists of a combination of sub-band coding and transform coding. With this method, the audio signal in the time domain is divided into plural frequency ranges by SBC, and the resulting signals in the respective frequency ranges are divided in time into blocks, and each block is orthogonally transformed to generate spectral components in the frequency domain. The spectral components resulting from transforming each block are divided into frequency bands in which they are encoded.
An example of the filters useful for dividing a digital audio input signal in the time domain into frequency ranges is the quadrature mirror filter (QMF), which is described in detail in R. E. Crochiere, Digital Coding of Speech in Sub-bands, BELL SYST. TECH. J., Vol. 55, No. 8, 1976.
The technique of dividing the digital audio input signal in frequency into frequency ranges of an equal width is discussed in Joseph H. Rothweiler, Polyphase Quadrature Filers-a New Sub-band Coding Technique, ICASSP 83, BOSTON (1983)
In performing the above-mentioned orthogonal transform, a digital audio input signal is divided into blocks at an interval of a predetermined time period called a frame, and an orthogonal transform is executed on each of the blocks. Examples of the orthogonal transform are the discrete Fourier transform (DFT), the discrete cosine transform (DCT), and the modified discrete cosine transform (MDCT). In all of these transform, a signal in the time domain is transformed into spectral components in the frequency domain. The MDCT is described in J. P. Princen and A. B. Bradley, Sub-band/Transform Coding Using Filter Bank Designs Based on Time-Domain Aliasing Cancellation, ICASSP 1987.
The widths of the frequency bands into the spectral components are divided for encoding are set to correspond to critical bands, which take account of the frequency resolution characteristics of the human auditory sense. In this, the audio frequency range is divided into a plurality of, such as 25, bands, so that the bandwidth of the frequency band becomes broader, the higher the frequency of the band.
A bit allocation is made to each frequency band in a predetermined or an adaptive manner to encode the spectral components in the frequency band. For example, the spectral components resulting from the MDCT processing are encoded by an adaptive bit allocation to each frequency band.
The following two adaptive bit allocation methods are known. In IEEE TRANSACTIONS OF ACOUSTIC, SPEECH, AND SIGNAL PROCESSING, vol. ASSP-25, No.4, August 1977, there is shown a technique of allocating bits based on the signal magnitude in each frequency band. With this system, the quantizing noise spectrum is made flat in each frequency range such that the noise energy is minimized. However, this system has a drawback that, since the masking effect of the human auditory sense is not utilized, the noise spectrum is not optimized to minimize its perception by the listener.
To overcome this shortcoming, coefficients known as shaping factors may be utilized at the time of bit allocation decision to adapt the quantizing noise spectrum to the characteristics of the human auditory sense. However, when a sine wave having a frequency of, e.g., 1 kHz, is used for measuring the characteristics of this quantizing method, the allocated bits cannot be sufficiently concentrated at the frequency of the sine wave, and characteristic values as good as desired cannot be obtained.
The masking effect, which is among the characteristics of the human auditory sense, is the effect in which a tone is masked by another tone and is thus rendered inaudible. The masking effect may be classified into the time-domain masking effect and concurrent, frequency-domain masking effect. As a result of the masking effect, noise, if present, that is masked is concealed and cannot be heard. For this reason, the noise in present in an actual audio signal, if within the masking range of the audio signal, is termed an allowable noise.
The time-domain masking effect may also be classified into the forward masking effect and the backward masking effect. The forward masking effect is the effect of a temporally-earlier tone masking a temporally-later tone. Conversely, backward masking is the effect of a temporally-later tone masking a temporally-earlier tone. Backward masking is known to exhibit its masking effect for a markedly shorter time than forward masking.
In M. A. Kransner, The Critical Band Coder Digital Encoding of the Perceptual Requirements of the Auditory System, ICASSP 1980, there is disclosed a method of deriving the signal-to-noise ratio required for each band by using auditory masking determining a fixed allocation of bits. This method has the deficiency that, when the characteristics are measured using a sine wave, the measured values are not as good as desired because of the fixed bit allocation.
In this manner, if bits are allocated among the bands based on the signal magnitude in each band to minimize the quantizing noise energies, the noise level perceived by the listener is not minimized. On the other hand, if fixed noise shaping factors are introduced, or if a fixed bit allocation is made to each band in consideration of the masking effect, it is difficult to achieve a satisfactory signal-to-noise characteristics measured using a sine wave.