1. Field of the Invention
This invention relates to a signal encoding method for encoding input digital data by high-efficiency encoding.
2. Description of the Related Art
A variety of high-efficiency encoding techniques exist for encoding audio or speech signals. Examples of these techniques include transform coding as a blocking frequency splitting system of the blocking frequency spectrum splitting system (orthogonal transform) and a sub-band coding system (SBC) as a non-blocking frequency spectrum splitting system. In transform coding, audio signals on the time axis are blocked every pre-set time interval, the blocked time-domain signals are transformed into signals on the frequency axis, and the resulting frequency-domain signals are split into plural frequency bands and encoded from band to band. In the sub-band coding system, the audio signals on the time axis are split into plural frequency bands and encoded without blocking. In a combination of the sub-band coding system and the transform coding system, the audio signals on the time axis are split into plural frequency bands by the sub-band coding system, and the resulting band-based signals are transformed into frequency-domain signals by orthogonal transform for encoding.
As band-splitting filters used in the sub-band coding system, there is a quadrature mirror filter (QMF) discussed in R. E. Crochiere, "Digital Coding of Speech in Subbands", Bell Syst. Tech. J., Vol.55, No.8, 1976. This QMF filter divides the frequency spectrum into two bands of equal bandwidth. With the QMF filter, aliasing is not produced on subsequent synthesis of the band-split signals.
The technique of splitting the frequency spectrum is discussed in Joseph H. Rothweiler, Polyphase Quadrature Filters-A New Subband Coding Technique", ICASSP 83 BOSTON. With a polyphase quadrature filter, the signal can be split into plural frequency bands of equal bandwidths.
Among the techniques for orthogonal transform, there is known such a technique in which an input audio signal is split into frames of a predetermined time duration and the resulting frames are processed by discrete Fourier transform (DFT), discrete cosine transform (DCT) or modified DCT (MDCT) to convert the signals from the time axis to the frequency axis. Discussions of a MDCT may be found in J. P. Princen and A. B. Bradley, "Subband/Transform Coding Using Filter Bank Based on Time Domain Aliasing Cancellation", ICASSP 1987.
If DFT or DCT is used as the method for orthogonal transform of the waveform signal, and a transformation is performed with time blocks each consisting of, for example, M sample data, M independent real-number data are obtained. Since M1 sample data are overlapped between neighboring time blocks for reducing connection distortion of time blocks, M real-number data are obtained on an average for (M-M1) sample data with DFT or DCT, so that these M real-number data are subsequently quantized and encoded.
If the above-described MDCT is used as the orthogonal transform method, M independent real-number data are obtained from 2M samples resulting from overlapping N sample data with both neighboring time blocks. That is, if MDCT is used, M real-number data are obtained from M sample data on an average. These M real-number data are subsequently quantized and encoded. In the decoding apparatus, waveform elements obtained on inverse transform in each block from the codes obtained using MDCT are summed together with interference for reconstructing waveform signals.
In general, if the time block for orthogonal transform is lengthened, frequency resolution is increased, such that the signal energy is concentrated in specified spectral signal components. Therefore, by employing MDCT in which a long time block length obtained by overlapping one-half sample data between neighboring time blocks is used for orthogonal transform and in which the number of resulting spectral signal components is not increased as compared to the number of original time-domain sample data, a higher encoding efficiency may be realized than if the DFT or DCT is used. If a sufficiently long overlap between neighboring time blocks is used, connection distortion between time blocks of waveform signals can be reduced.
By quantizing signal components split from band to band by a filter or orthogonal transform, it becomes possible to control the band subjected to quantization noise, thus enabling encoding with perceptually higher encoding efficiency by exploiting masking effects. By normalizing respective sample data with maximum value of the absolute values of the signal components in each band prior to quantization, the encoding efficiency may be improved further.
As the band splitting width used for quantizing the signal components resulting from splitting of the frequency spectrum of the audio signals, the band width taking into account the psychoacoustic characteristics of the human being is preferably used. That is, the frequency spectrum of the audio signals is preferably split into a plurality of, for example, 25, critical bands. The width of the critical bands increases with increasing frequency. In encoding the band-based data in such case, bits are fixedly or adoptively allocated among the various critical bands. For example, when applying adaptive bit allocation to the special coefficient data resulting from a MDCT, the spectral coefficient data generated by the MDCT within each of the critical bands is quantized using an adoptively allocated number of bits. The following two techniques are known bit allocation techniques.
In R. Zelinsky and P. Noll, "Adaptive transform Coding of Speech Signals", IEEE Transactions of Acoustics, Speech and Signal processing", vol. ASSP-25, August 1977, bit allocation is carried out on the basis of the amplitude of the signal in each critical band. This technique produces a flat quantization spectrum and minimizes noise energy, but the noise level perceived by the listener is not optimum because the technique does not exploit the psychoacoustic masking effect.
In M. A. Krassener, "The Critical Band Coder-Digital Encoding of the Perceptual Requirements of the Auditory System", there is described a technique in which the psychoacoustic masking effect is used to determine a fixed bit allocation that produces the necessary bit allocation for each critical band. However, with this technique, since the bit allocation is fixed, non-optimum results are obtained even for a strongly tonal signal such as a sine wave.
For overcoming this problem, it has been proposed to divide the bits that may be used for bit allocation into a fixed pattern allocation fixed for each small block and a bit allocation portion dependent on the amplitude of the signal in each block. The division ratio is set depending on a signal related to the input signal such that the division ratio for the fixed allocation pattern portion becomes higher the smoother the pattern of the signal spectrum.
With this method, if the audio signal has high energy concentration in a specified spectral signal component, as in the case of a sine wave, abundant bits are allocated to a block containing the signal spectral component for significantly improving the signal-to-noise ratio as a whole. In general, the hearing sense of a human being is highly sensitive to a signal having sharp spectral signal components, so that, if the signal-to-noise ratio is improved by using this method, not only the numerical values as measured can be improved, but also the audio signal as heard may be improved in quality.
Various other bit allocation methods have been proposed and the perceptual models have become refined, such that, if the encoder is of high ability, a perceptually higher encoding efficiency may be realized.
In these methods, it has been customary to find a real-number reference value of bit allocation whereby the signal to noise ratio as found by calculations will be realized as faithfully as possible and to use an integer approximate to this reference value as the allocated number of bits.
In the U.S. Pat. No. 5,778,339 as filed by the present Assignee, there is disclosed an encoding method in which a perceptually critical tonal component, that is a spectral signal component exhibiting a signal energy concentration in the vicinity of a specified frequency, is separated from other spectral signal components, and encoded in separation from other spectral components. This method enables audio signals to be encoded efficiently with high efficiency without substantially producing perceptual deterioration of audio signals.
In constructing an actual codestring, it suffices to encode the quantization precision information and the normalization coefficient information with a predetermined number of bits for each band designed for normalization and quantization and to encode the normalized and quantized spectral signal components.
In MPEG-1 audio, there is disclosed a high-efficiency encoding system in which the number of bits representing the quantization precision information will be different values from band to band. Specifically, the number of bits representing the quantization precision information is set to be smaller with increasing frequency.
There is also known a method in which the quantization precision information is determined from, for example, the normalization coefficient information by a decoder without directly encoding the quantization precision information. Since the relation between the normalization coefficient information and the quantization precision information is set at the time of standard formulation, it becomes impossible to introduce quantization precision control based on an advanced perceptual model in the future. In addition, if there is allowance in the compression ratio to be realized, it becomes necessary to set the relation between the normalization coefficient information and the quantization precision information from one compression ratio to another.
In D. A. Huffman, "A Method for Construction of Minimum Redundancy Codes", Proc. I.R.E., 40, p.1098 (1952), quantized spectral signal components are encoded more efficiently by encoding using variable length codes.
In the United States Application Ser. 08/491,948, now U.S. Pat. No. 5,778,339 filed by the present Assignee, it is proposed to adjust the normalization coefficients in case of using variable length codes for more efficient encoding of the quantized spectral signal components with a smaller number of bits. With this method, there is no risk of significant signal dropout in a specified area in case of raising the compression ratio. In particular, there is no risk of dropout or appearance of specified band signal components on the frame basis, thus avoiding the problem of generation of perceptually objectionable harsh noise. In addition, since it suffices to modify the normalization coefficient, encoding may be realized using small-sized hardware.
However, if the normalization coefficient is modified from its optimum value, the quantization noise is increased for the band. Thus, if the normalization coefficient is modified to more than a necessary extent, wasteful sound quality degradation tends to be produced.