A variety of techniques exist for high efficiency encoding of digital audio signals or speech signals. Examples of these techniques include a sub-band coding (SBC) of splitting e.g., time-domain audio signals into plural frequency bands, and encoding the signals from one frequency band to another, without blocking the time-domain signals, as a non-blocking frequency band splitting system, and a blocking frequency band splitting system, or transform encoding, of converting time-domain signals by an orthogonal transform into frequency-domain signals, which frequency-domain signals are encoded from one frequency band to another. There is also a technique of high efficiency encoding consisting in the combination of the sub-band coding and transform coding. In this case, the time-domain signals are divided into plural frequency bands by sub-band coding, and the resulting band-based signals are orthogonal-transformed into signals in the frequency domain, which signals are then encoded from one frequency band to another.
There are known techniques for orthogonal transform including the technique of dividing the digital input audio signals into blocks of a predetermined time duration, by way of blocking, and processing the resulting blocks using a Discrete Fourier Transform (DFT), discrete cosine transform (DCT) or modified DCT (MDCT) to convert the signals from the time axis to the frequency axis. Discussions of a MDCT may be found in J. P. Princen and A. B. Bradley, Subband/Transform Coding Using Filter Bank Designs Based on Time Domain Aliasing Cancellation, ICASSP, 1987, Univ. of Surrey Royal Melbourne Inst. of Tech.
By quantizing the signals, divided from band to band, using a filter or orthogonal transform, it is possible to control the band susceptible to quantization noise and, by exploiting such properties as masking effect, it is possible to achieve psychoacoustically more efficient encoding. If, prior to quantization, the signal components of the respective bands are normalized using the maximum absolute value of the signal components of each band, the encoding efficiency may be improved further.
In quantizing the frequency components, resulting from the division of the frequency spectrum, it is known to divide the frequency spectrum into widths which take characteristics of the human acoustic system into account. That is, audio signals are divided into plural bands, such as 32 bands, in accordance with band widths increasing with increasing frequency. In encoding the band-based data, bits are allocated fixedly or adaptively from band to band. When applying adaptive bit allocation to coefficient data resulting from MDCT, the MDCT coefficient data are encoded with an adaptively allocated number of bits from one frequency band resulting from the block-based MDCT to another.
It should be noted that, in orthogonal transform encoding and decoding of time-domain acoustic signals, the noise contained in tonal acoustic signals, the energy of which is concentrated in a specified frequency, is extremely harsh to the ear and hence may prove to be psychoacoustically highly objectionable. For this reason, a sufficient number of bits need to be used for encoding the tonal components. However, if the quantization step is determined fixedly from one band to another, as described above, the encoding efficiency is lowered because the bits are allocated uniformly to the totality of spectral components in an encoding unit containing the tonal components.
For coping with this deficiency, there is proposed in for example the International Patent Publication WO94/28633 or Japanese Laying-Open Patent Publication 7-168593 a technique in which the spectral components are divided into tonal and non-tonal components and finer quantization steps are used only for the tonal components.
In this technique, the spectral components with a locally high energy level, that is tonal components T, are removed from the spectrum on the frequency axis as shown in FIG. 1A. The spectrum of noisy components, freed of tonal components, is shown in FIG. 1B. The tonal and noisy components are quantized using sufficient optimum quantization steps.
However, in orthogonal transform techniques, such as MDCT, it is presupposed that the waveform in a domain being analyzed is repeated periodically outside the domain being analyzed. Consequently, the frequency components which really do not exist are observed. For example, if a sine wave of a certain frequency is input, and orthogonal-transformed by MDCT, the resulting spectrum covers not only the inherent frequency but also the ambient frequency, as shown in FIG. 1A. Thus, if the sine wave is to be represented to high accuracy, not only the inherent sole frequency but also plural spectral components neighboring to the inherent frequency on the frequency axis need to be quantized with sufficient quantization steps, even though it is only being attempted by the above technique to quantize only the tonal components with high accuracy as shown in FIG. 1A. As a result, more bits are needed, thus lowering the encoding efficiency.