1. Field of the Invention
The present invention relates to an acoustic signal coding method and apparatus, acoustic signal decoding method and apparatus, and a recording medium having recorded therein programs for the coding and decoding.
2. Description of the Related Art
There have been proposed various methods for highly efficient coding of audio or speech signal, such as a non-blocked frequency band division method called “SBC (subband coding)” in which an audio signal or the like on the time base is coded by dividing the signal into a plurality of frequency bands without blocking it, a blocked frequency band division method called “transform coding” in which a signal on the time base is transformed to a signal on the frequency base (spectrum transform) to divide it into a plurality of frequency bands and thus the signal is coded in each of the frequency bands, etc. Also, a combination of the subband coding and transform coding has been proposed as one of the highly efficient coding methods. In this case, after a signal is divided into frequency bands by the subband coding, for example, the signal in each band is transformed to a signal on the frequency base by the spectrum transform, and coded in each spectrum-transformed band. As a filter used for the frequency band division, QMF (quadrature mirror filter) is available, for example, which is disclosed in “Digital Coding of Speech in Subbands”, R. E. Crochiere, Bell Syst. Tech. J. Vol. 55, No. 8, 1976. Also, PQF (polyphase quadrature filter) has been proposed in the disclosure in “Polyphase Quadrature Filters—A New Subband Coding Technique”, Joseph H. Rothweiler, IC ASSP 83, Boston.
In the aforementioned spectrum transform, for example, an input audio signal is blocked into frames each of a predetermined unit time, and each blocked signal is subjected to DFT (discrete Fourier transform), DCT (discrete cosine transform), MDCT (modified discrete cosine transform) or the like to transform the time base to a frequency base. The MDCT is known from “Subband/Transform Coding Using Filter Bank Designs Based on Time Domain Aliasing Cancellation”, J. P. Princen & A. B. Bradley, ICASSP 1987, Univ. of Surrey Royal Melbourne Inst. of Tech.
By quantizing a signal having been divided in bands by such a filter or spectrum transform, a band where a quantum noise takes place can be controlled, and masking effect or the like can be utilized to attain a higher efficiency of acoustic signal coding and a high acoustic quality of the coded signal. Also, by normalizing a signal with a maximum absolute value, for example, of a component in each band of the signal before quantizing the signal, the signal can be coded with a still higher efficiency.
For quantization of each frequency component resulted from a frequency band division, a division width is selected with the human auditory characteristics taken in consideration. That is, an audio signal is divided into a plurality of bands, for example, 32 bands, each having a bandwidth generally called “critical band” which will be wider as the frequency is higher. Also, data in each band is coded by a predetermined bit assignment to each band or by a bit allocation adaptive to each band. For example, to code an MDCT-processed coefficient data by the bit allocation, an MDCT coefficient data in each band, obtained by the MDCT of each block, will be coded with an adaptive allocated number of bits. For the bit allocation, the following two methods are known.
One of them is known from the IEEE Transactions of Acoustics, Speech, and Signal Processing, Vol. ASSP-25, No. 4, August 1977. In this method, the bit allocation is made based on a signal size in each band. The quantum noise spectrum is flat and noise energy is minimum. Since no masking effect is utilized in this method, however, no optimum acoustic noise reduction can practically be attained. The other method is disclosed in “The critical band coder—digital encoding of the perceptual requirements of the auditory system”, M. A. Kransner, ICASSP 1980, MIT. In this method, an auditory masking is utilized to attain a necessary signal-to-noise ratio for each band in order to effect a fixed allocation of bits. However, even when a sine wave input is used in this method to measure a signal-to-noise ratio, not so good a signal-to-noise ratio can be assured since the bit allocation is fixed. To overcome these problems, an highly efficient coding has been proposed in which all bits usable in the bit allocation are allocated depending upon a fixed bit allocation pattern predetermined for each sub-block and also on the signal magnitude in each block and the dependence upon the fixed bit allocation pattern is larger as the signal spectrum is smoother.
The above method permits to remarkably improve, when an energy is concentrated to a specific spectrum such as a sine wave input, the whole signal-to-noise ratio by allocating many bits to a block including the spectrum. Generally, since the human acoustic apparatus is extremely sensitive to a signal having a steep spectrum component, the use of such a method to improve the signal-to-noise ratio will not only improve the numerical value of the measured signal-to-noise ratio but also the quality of a sound to the human auditory organ.
In addition, many other bit allocation methods have been proposed, and the auditory sense model has been more elaborated, so that a higher efficiency of coding and a high acoustic quality of the coded signal can be attained if the capability of an encoder used allows it.
If a signal is decomposed into frequency components once and the frequency components are quantized for coding, a wave signal obtained by decoding and combining the frequency components will incur a quantum noise. However, if the frequency components of the original vary rapidly, the quantum noise in the wave signal will be large even in a portion where the original signal waveform is not large and the quantum noise called “pre/post echo” will not be masked by a simultaneous masking. Thus the quantum noise will be an acoustic disturbance. Especially when a signal is decomposed into many frequency components using the spectrum transform, the time resolution will be worse and thus a large quantum noise will occur for a long period. In this case, reduction of the transformed length of spectrum will shorten also the period for which the quantum noise takes place, which however will make worse the frequency resolution. Thus, the efficiency of coding a quasi-stationary portion will be lower. To solve this problem, a method has been proposed in which the transformed length is reduced at the expense of the frequency resolution of a signal. However, since the transformed length reduction will cause to decrease the number of bits per transformed block, no sufficient accuracy of quantization can be assured so that no good sound quality of the decoded signal can be provided.
To cope with the above problem, it has been proposed to decode and/or code an acoustic time domain signal while a transformed frame length is kept fixed by processing the signal for the amplitude to increase in a micro amplitude zone and then transforming and/or quantizing the signal to a frequency spectrum with the transformed block length kept fixed also when the acoustic time domain signal changes greatly in terms of time in the encoder, and by recording the processed amplitude information in a code row.
In a decoder, the operations effected in the encoder are effected reversely to process, using amplitude controlling information recorded in a code row, the amplitude controlling information of an acoustic time domain signal restored from a frequency spectrum.
By the above processing, it is possible to effectively suppress a pre and/or post echo developed in the micro amplitude zone when the acoustic time domain signal changes greatly within the block. Also, a subband filter can be used to divide the band of an acoustic time domain signal and the amplitude information can be processed in each band, to effectively suppress a pre and/or post echo.
In addition to the pre and/or post echo, however, there are other factors to disturb the auditory sensation. Among others, setting a frame length a little larger in the transform coding will be an acoustic disturbance. The larger the block length, the better the frequency resolution will be and thus the higher the coding efficiency will be. In the case of an original acoustic time domain signal, however, a time domain signal of a specific frequency component developed for a specific limited time will be diffused in a block in a decoded acoustic time domain signal to be an acoustic disturbance. This phenomenon will take place also when an original acoustic time domain signal does not vary greatly in a block, which problem could not be solved by any apparatus adapted to suppress a pre and/or post echo.