1. Field of the Invention
This invention relates to a signal coding method and an apparatus therefor wherein an input signal such as digital data is coded by so-called high-efficient coding.
2. Description of the Related Art
Conveniently, a variety of methods for high-efficient coding of signals such as audio and acoustic sounds, and apparatuses therefor are available. For example, a so-called conversion coding method in which signals residing on time axis are framed by the given time, each framed signals on the time axis are converted into signals on frequency axis (spectral conversion) and divided into a plurality of frequency areas and coded for each band, and so-called sub-band coding (SBC) method in which audio signals, etc. on the time axis are divided into a plurality of frequency bands without being framed and coded are well-known methods. Further, high-efficient coding methods and apparatuses by combination of the aforementioned band division coding and conversion coding have been also conceived. In this case, after divided into bands by the aforementioned band division coding method, for example, signals in each band are spectrum-converted into signals on the frequency axis and signals of each spectrum-converted band are subjected to coding.
As a band division filter to be used for the aforementioned band division coding method, for example, Quadranture Mirror Filter (QMF) or the like are currently available. This has been stated in a reference Digital coding of speech in subbands (R. E. Crochiere, Bell Syst. Tech.J., Vol. 55, No. 8 1976). This QMF filter divides a band into two with equal band widths. The feature of this filter is that no aliasing occurs when the band portions divided as mentioned above are synthesized.
Additionally, a reference Polyphase Quadrature filters--A new subband coding technique states a band division method in which a signal is divided into bands with equal bandwidth using a filter. This polyphase quadrature filter is characteristic in that division of a signal into a plurality of equal width bands can be done at a time.
As the spectral conversion method mentioned above, some spectral conversion methods are known in which input audio signals are framed by a given time and each frame is subjected to discrete Fourier transform (DFT), discrete cosine transform (DCT) or modified discrete cosine transform (MDCT) or the like to convert time axis to frequency axis. Meanwhile, the aforementioned MDCT is stated in a reference "Subband/Transform Coding Using Filter Bank Designs, Based on Time Domain Aliasing Cancellation," J. J. Princen A. B. Bradley, Univ. of Surrey Royal Melbourne Inst of Tech. ICASSP 1987.
By quantizing signals divided into respective bands by use of the filter or spectral conversion as stated above, it is possible to control a band in which quantizing noise occurs and perform high-efficient coding in auditory sense by using a so-called masking effect or the like. Further, if each band is normalized with a maximum value of the absolute values of signal components within that band, further high-efficient coding can be done.
Here, as a frequency division width for quantizing each frequency components divided into a frequency band, for example, a band width taking the auditory characteristic of the human being into account is often used. Namely, this is a band width called critical band in which generally the band width increases as the band becomes higher and audio signals are sometimes divided into a plurality of bands (for example, 25 bands). In this case, when data of each band is coded, the coding is carried out by distributing given bits to each band or allocating bits adaptive for each band (bit allocation). For example, when coefficient data obtained by the aforementioned MDCT processing is coded by the above-mentioned bit allocation, the coding is carried out by allocating adaptive bits to the MDCT coefficient data of each band obtained by the MDCT processing in each of the aforementioned frames.
As the bit allocation method stated above, the following two methods are well known.
For example, a reference Adaptive Transform Coding of Speech Signals, R. Zelinski, P. Noll, IEEE Transaction of Acoustics, Speech, and Signal Processing, vol.ASSP-25, No. 4, August 1977 states that the bit allocation is carried out based on the size of signals in each band. According to this method, although quantizing noise spectrum flattens so that noise energy minimizes, perception to noise in terms of actual auditory sense is not optimum because the masking effect is not audibly utilized.
Further, for example, a reference The critical band coder-digital encoding of the perceptual requirements of the auditory system M. A. Kransner MIT, ICASSP 19800 states a method in which a signal-to-noise ratio necessary for each band is obtained by using the auditory masking to carry out fixed bit allocation. However, according to this method, because its bit allocation is fixed even when measuring a characteristic by sine wave input, the characteristic value is not so good.
To solve these problems, a high-efficient coding apparatus has been proposed in which all bits which can be used for bit allocation are divided into a preliminarily fixed allocation pattern portion for each band or each block obtained by dividing each band and a portion for carrying out a bit allocation dependent on the size of signals in each block, and the division ratio is made to depend upon signals related to input signals so that the ratio of division to the aforementioned fixed bit allocation pattern portion increases as the spectral distribution of the signals becomes smoother.
According to this method, for example, when energy is concentrated on a specific spectral component such as in the case of sine wave input, by allocating more bits to a block containing that spectral component, it is possible to considerably improve the overall signal-to-noise characteristic. Generally, because the auditory sense of the human being is very keen to signals having a steep spectral distribution, improvement of the signal-to-noise characteristic by using such a method not only simply improves measurement values but also is effective for improvement of sound quality in terms of auditory sense.
Meantime, in addition to this method, there have been proposed many other methods of bit allocation. If the model for the auditory sense is made more precise and the capacity of the coding apparatus is improved, a further high-efficient coding in terms of the auditory sense can be realized.
However, because in the above-described conventional method a band in which the frequency components are quantized is fixed, for example if the spectral components are concentrated in the vicinity of some specific frequencies, quantization of those spectral components at full precision requires a number of bits to be allocated to a number of the spectral components which belong to the same band as those spectral components, thereby reducing the efficiency.
That is, generally, noise contained in tone property acoustic signals in which energy is concentrated on specific spectral components is perceptible as compared to, for example, noise in acoustic signals in which energy is distributed smoothly over a wide frequency range, so that it becomes a large obstacle in terms of the auditory sense. Further, unless the spectral components having a large energy or the tone property components are quantized at full precision, when those spectral components are returned to waveform signals on the time axis so as to be synthesized with frames in the front and back, there occurs a large distortion between the frames (when the spectral components are synthesized with the waveform signals of a time frame adjacent, there occurs a large connection distortion), thereby also providing an obstacle to the auditory sense. Thus, according to the conventional methods, it has been difficult to improve coding efficiency of particularly the tone property acoustic signals without deteriorating the sound quality.
To solve this problem, an applicant of this invention proposed in the specification and drawings of U.S. patent application Ser. No. 08/374,518 (filed May 31, 1994), now issued as U.S. Pat. No. 5,717,821 on Feb. 10, 1998, a method in which input acoustic signals were separated into tone property components in which energy is concentrated to a specific frequency and components (noise property components or non-tone property components) in which energy is distributed smoothly over a wide band to code them respectively, thereby achieving a high coding efficiency.
That is, according to this method previously proposed, the aforementioned input acoustic signals are converted in terms of frequency and then each frequency component (spectral component) obtained thereby is further divided, for example, by critical band. Then, the spectral components of each divided band are separated to the tone property components and noise property components (non-tone property components) and a number of bits are allocated to only each of the separated tone property components (spectral components in a very narrow range on the frequency axis in which the tone property components in the band reside) in order to achieve high-efficient coding. Meanwhile, as a very narrow range on the frequency axis in which the aforementioned tone property components exist, for example, a range including a given number of the spectral components substantially consisting of the spectral components containing a maximum energy which is each tone property component may be picked up as one of its examples.
According to the aforementioned method which will be proposed later, by carrying out the above-described processing, it is possible to realize more high-efficient coding as compared to the method of quantizing spectral components residing within each of the aforementioned fixed bands. The spectral components codes as mentioned above are recorded in a recording medium together with positional information of the tone property components corresponding to the frequency axis or transmitted to a transmission path.
However, because the spectral components constituting the acoustic signals are complicated, the spreading of respective spectral components constituting the tone property components on the frequency axis varies. That is, in the case of sine waves, the energy of the spectral components decreases quickly as they depart from that frequency, so that most energy is concentrated on a very small number of the spectral components. On the other hand, although the tone property components may be extracted in the case of an ordinary musical instrument, the respective tone property components in the spectral components composed of acoustic signals obtained by play of the musical instrument do not have so steep an energy distribution as in the case of sine wave. Additionally, the spreading of energy distribution of the spectral components constituting the tone property components largely varies depending upon the kind of the musical instrument.
In the case of the aforementioned U.S. patent application Ser. No. 08/374,518, now issued as U.S. Pat. No. 5,717,821, when extracting the tone property components, a spectrum having a large peak is specified as a tone property spectrum and then, two spectrums adjacent that spectrum are specified as the tone property spectrums. Thus, the tone property spectrums are always extracted in the unit of three.
Here, when a given number of the spectral components which are mainly ones having a maximum energy as the tone property component are normalized and quantized, if the quantity of the spectral components increases, a given number of bits are required to quantize very small spectral components far away from the center spectral component, which can be neglected in terms of the auditory sense to the tone property components having a very steep spectral energy distribution, and the coding efficiency is deteriorated.
On the other hand, if the quantity of the spectral components is decreased, it is required to separately code spectral components which cannot be neglected to the tone property components having a very smooth spectral energy distribution in terms of the auditory sense separately from those tone property components, and the overall coding efficiency is recorded. Thus, it has been necessary to extract the tone property component spectrums effectively.