1. Field of the Invention
This invention relates to a signal encoding method for encoding input digital data by so-called high-efficiency encoding.
2. Description of the Related Art
A variety of high-efficiency encoding techniques exist for encoding audio or speech signals. Examples of these techniques include so-called transform coding as a blocking frequency splitting system of the blocking frequency spectrum splitting system (orthogonal transform) and a so-called sub-band coding system (SBC) as a non-blocking frequency spectrum splitting system. In the transform coding, audio signals on the time axis are blocked every pre-set time interval, the blocked time-domain signals are transformed into signals on the frequency axis, and the resulting frequency-domain signals are split into plural frequency bands and encoded from subband to subband. In the sub-band coding system, the audio signals on the time axis are split into plural frequency subbands and encoded without blocking. In a combination of the sub-band coding system and the transform coding system, the audio signals on the time axis are split into plural frequency subbands by sub-band coding system, and the resulting band-based signals are transformed into frequency-domain signals by orthogonal transform for encoding.
As band-splitting filters used in the sub-band coding system, there is a so-called quadrature mirror filter (QMF) discussed in R. E. Crochiere, "Digital Coding of Speech in Sub-bands", Bell Syst. Tech. J., Vol.55, No.8, 1976. This QMF filter divides the frequency spectrum in two subbands of equal bandwidths. With the QMF filter, so-called aliasing is not produced on subsequent synthesis of the band-split signals.
The technique of splitting the frequency spectrum is discussed in Joseph H. Rothweiler, Polyphase Quadrature Filters-A New Subband Coding Technique", ICASSP 83 BOSTON. With the polyphase quadrature filter, the signal can be split into plural frequency subbands of equal bandwidths.
Among the techniques for orthogonal transform, there is a technique in which the input audio signal is split into frames of a predetermined time duration and the resulting frames are processed by discrete Fourier transform (DFT), discrete cosine transform (DCT) or modified DCT (MDCT) to convert the signals from the time axis to the frequency axis. Discussions of a MDCT may be found in J. P. Princen and A. B. Bradley, "Subband/Transform Coding Using Filter Bank Based on Time Domain Aliasing Cancellation", ICASSP 1987.
If DFT or DCT is used as method for orthogonal transform of the waveform signal, and a transform is performed with time blocks each consisting of, for example, M sample data, M independent real-number data are obtained. Since M1 sample data are overlapped between neighboring time blocks for reducing connection distortion of time blocks, M real-number data are obtained on an average for (M-M1) sample data with DFT or DCT, so that these M real-number data are subsequently quantized and encoded.
If the above-described MDCT is used as the orthogonal transform method, M independent real-number data are obtained from 2M samples resulting from overlapping N sample data with both neighboring time blocks. That is, if MDCT is used, M real-number data are obtained from M sample data on an average. These M real-number data are subsequently quantized and encoded. In the decoding apparatus, waveform elements obtained on inverse transform in each block from the codes obtained using MDCT are summed together with interference for reconstructing waveform signals.
In general, if the time block for orthogonal transform is lengthened, frequency resolution is increased, such that the signal energy is concentrated in specified spectral signal components. Therefore, by employing MDCT in which a long time block length obtained by overlapping one-half of the sample data between neighboring time blocks is used for orthogonal transform, and in which the number of resulting spectral signal components is not increased as compared to the number of the original time-domain sample data, a higher encoding efficiency may be realized than if the DFT or DCT is used. If a sufficiently long overlap between neighboring time blocks is used, the connection distortion between time blocks of waveform signals can be reduced.
By quantizing signal components split from subband to subband by a filter or orthogonal transform, it becomes possible to control the subband subjected to quantization noise, thus enabling encoding with perceptually higher encoding efficiency by exploiting masking effects. By normalizing respective sample data with the maximum value of the absolute values of the signal components in each subband prior to quantization, a still higher encoding efficiency may be achieved.
It is preferable that the psychoacoustic characteristics of human beings are taken into account in determining the subband splitting width for quantizing the signal components resulting from splitting the frequency spectrum of the audio signals. That is, the frequency spectrum of the audio signals is divided into a plurality of, for example, 25, critical subbands. The width of the critical subbands increases with increasing frequency. In encoding the subband-based data in such case, bits are fixedly or adaptively allocated among the various critical subbands. For example, when applying adaptive bit allocation to the special coefficient data resulting from a MDCT, the spectra coefficient data generated by the MDCT within each of the critical subbands is quantized using an adaptively allocated number of bits. The following two techniques are known as the bit allocation technique.
In R. Zelinsky and P. Noll, "Adaptive transform Coding of Speech Signals", IEEE Transactions of Acoustics, Speech and Signal processing", vol. ASSP-25, August 1977, bit allocation is carried out on the basis of the amplitude of the signal in each critical subband. This technique produces a flat quantization spectrum and minimizes noise energy, but the noise level perceived by the listener is not optimum because the technique does not exploit the psychoacoustic masking effect.
In M. A. Krassener, "The Critical Band Coder-Digital Encoding of the Perceptual Requirements of the Auditory System", there is described a technique in which the psychoacoustic masking effect is used to determine a fixed bit allocation that produces the necessary bit allocation for each critical subband. However, with this technique, since the bit allocation is fixed, non-optimum results are obtained even for a strongly tonal signal such as a sine wave.
For overcoming this problem, it has been proposed to divide the bits that may be used for bit allocation into a fixed pattern allocation fixed for each small block and a bit allocation portion dependent on the amplitude of the signal in each block. The division ratio is set depending on a signal related to the input signal such that the division ratio for the fixed allocation pattern portion becomes higher the smoother the pattern of the signal spectrum.
With this method, if the audio signal has high energy concentration in a specified spectral signal component, as in the case of a sine wave, abundant bits are allocated to a block containing the signal spectral component for significantly improving the signal-to-noise ratio as a whole. In general, the hearing sense of the human being is highly sensitive to a signal having sharp spectral signal components, so that, if the signal-to-noise ratio is improved by using this method, not only the numerical values as measured can be improved, but also the audio signal as heard may be improved in quality.
Various other bit allocation methods have been proposed and the perceptual models have become refined, such that, if the encoder is of high ability, a perceptually higher encoding efficiency may be realized.
In these methods, it has been customary to find a real-number reference value of bit allocation whereby the signal to noise ratio as found by calculations will be realized as faithfully as possible and to use an integer approximate to this reference value as the allocated number of bits.
In the U.S. application Ser. No. 08/374,518, U.S. Pat. No. 5,717,821, as filed by the present Assignee, there is disclosed an encoding method in which a perceptually critical tonal component, that is a spectral signal component exhibiting signal energy concentration in the vicinity of a specified frequency, is separated from the spectral signal components, and encoded separately from other spectral components. This method enables audio signals to be encoded very efficiently without substantially producing perceptual deterioration of audio signals.
In constructing an actual codestring, it suffices to encode the quantization precision information and the normalization coefficient information with a predetermined number of bits for each subband designed for normalization and quantization and to encode the normalized and quantized spectral signal components.
In MPEG-1 audio, there is disclosed a high-efficiency encoding system in which the number of bits representing the quantization precision information will be different from subband to subband. Specifically, the number of bits representing the quantization precision information is set so as to be smaller with increasing frequency.
There is also known a method in which the quantization precision information is determined from, for example, the normalization coefficient information by a decoder without directly encoding the quantization precision information. Since the relation between the normalization coefficient information and the quantization precision information is set at the time of standard formulation, it becomes impossible to introduce quantization precision control based on an advanced perceptual model in future. In addition, if there is allowance in the compression ratio to be realized, it becomes necessary to set the relation between the normalization coefficient information and the quantization precision information from one compression ratio to another.
In D. A. Huffman, "A Method for Construction of Minimum Redundancy Codes", Proc. I.R.E., 40, p.1098 (1952), quantized spectral signal components are encoded more efficiently by encoding using variable length codes.
In the U.S. application Ser. No. 08/491,948, U.S. Pat. No. 5,778,339, filed by the present Assignee, it is proposed to adjust the normalization coefficients in case of using the variable length codes for more efficient encoding of the quantized spectral signal components with a smaller number of bits. With this method, there is no risk of significant signal dropout in a specified area in case of raising the compression ratio. In particular, there is no risk of dropout or appearance of specified subband signal components on the frame basis, thus avoiding the problem of generation of perceptually objectionable harsh noise.
However, if the conventional method is used for encoding with the aid of the above-described various encoding techniques, the number of processing steps is increased, such that it becomes difficult to encode the acoustic signals in a small apparatus in a real-time basis.