This invention relates to a method and apparatus for encoding input digital data by high-efficiency encoding, a method for transmitting the encoded information and a method and apparatus for reproducing and decoding the transmitted information.
There exist a variety of high efficiency encoding techniques of encoding audio or speech signals. Examples of these techniques include transform coding in which a frame of digital signals representing the audio signal on the time axis is converted by an orthogonal transform into a block of spectral coefficients representing the audio signal on the frequency axis, and a sub-band coding in which the frequency band of the audio signal is divided by a filter bank into a plurality of sub-bands without forming the signal into frames along the time axis prior to coding. There is also known a combination of sub-band coding and transform coding, in which digital signals representing the audio signal are divided into a plurality of frequency ranges by sub-band coding, and transform coding is applied to each of the frequency ranges.
Among the filters for dividing a frequency spectrum into a plurality of equal-width frequency ranges include the quadrature mirror filter (QMF) as discussed in R. E. Crochiere, Digital Coding of Speech in Sub-bands, 55 Bell Syst. Tech J. No. 8 (1976). With such QMF filter, the frequency spectrum of the signal is divided into two equal-width bands. With the QMF, aliasing is not produced when the frequency bands resulting from the division are subsequently combined together.
In "Polyphase Quadrature Filters--A New Subband Coding Technique", Joseph H. Rothweiler, ICASSP 83, Boston, there is shown a technique of dividing the frequency spectrum of the signal into equal-width frequency bands. With the present polyphase QMF, the frequency spectrum of the signals can be divided at a time into plural equal-width frequency bands.
There is also known a technique of orthogonal transform including dividing the digital input audio signal into frames of a predetermined time duration, and processing the resulting frames using a discrete Fourier transform (DFT), discrete cosine transform (DCT) and modified DCT (MDCT) for converting the signal from the time axis to the frequency axis. Discussions on MDCT may be found in J. P. Princen and A. B. Bradley, Subband Transform Coding Using Filter Bank Based on Time Domain Aliasing Cancellation", ICASSP 1987.
By quantizing the signals divided on the band basis by the filter or orthogonal transform, it becomes possible to control the band subjected to quantization noise and psychoacoustically more efficient coding may be achieved by utilizing the so-called masking effects. If the signal components are normalized from band to band with the maximum value of the absolute values of the signal components, it becomes possible to effect more efficient coding.
In a technique of quantizing the spectral coefficients resulting from an orthogonal transform, it is known to use sub bands that take advantage of the psychoacoustic characteristics of the human auditory system. That is, spectral coefficients representing an audio signal on the frequency axis may be divided into a plurality of critical frequency bands. The width of the critical bands increase with increasing frequency. Normally, about 25 critical bands are used to cover the audio frequency spectrum of 0 Hz to 20 kHz. In such a quantizing system, bits are adaptively allocated among the various critical bands. For example, when applying adaptive bit allocation to the spectral coefficient data resulting from MDCT, the spectral coefficient data generated by the MDCT within each of the critical bands is quantized using an adaptively allocated number of bits. There are presently known the following two bit allocation techniques.
For example, in IEEE Transactions of Acoustics, Speech and Signal Processing, vol. ASSP-25, No. 4, August 1977, bit allocation is carried out on the basis of the amplitude of the signal in each critical band. This technique produces a flat quantization noise spectrum and minimizes the noise energy, but the noise level perceived by the listener is not optimum because the technique does not effectively exploit the psychoacoustic masking effect.
In the bit allocation technique described in M. A. Krassner, The Critical Band Encoder--Digital Encoding of the Perceptual Requirements of the Auditory System, ICASSP 1980, the psychoacoustic masking mechanism is used to determine a fixed bit allocation that produces the necessary signal-to-noise ratio for each critical band. However, if the signal-to-noise ratio of such a system is measured using a strongly tonal signal, for example, a 1 kHz sine wave, non-optimum results are obtained because of the fixed allocation of bits among the critical bands.
For overcoming these inconveniences, a high efficiency encoding apparatus has been proposed in which the total number of bits available for bit allocation is divided between a fixed bit allocation pattern pre-set for each small block and a block-based signal magnitude dependent bit allocation, and the division ratio is set in dependence upon a signal which is relevant to the input signal such that the smoother the signal spectrum, the higher becomes the division ratio for the fixed bit allocation pattern.
With this technique, if the energy is concentrated in a particular spectral component, as in the case of a sine wave input, a larger number of bits are allocated to the block containing the spectral component, for significantly improving the signal-to-noise characteristics in their entirety. Since the human auditory system is highly sensitive to a signal having acute spectral components, such technique may be employed for improving the signal-to-noise ratio for improving not only measured values but also the quality of the sound as perceived by the ear.
In addition to the above techniques, a variety of other techniques have been proposed, and the model simulating the human auditory system has been refined, such that, if the encoding device is improved in its ability, encoding may be made with higher efficiency in light of the human auditory system.
In an international application PCT/JP94/00880, filed on May 31, 1994 in the name of the present Assignee, there is disclosed a method whereby tonal components, which are most crucial to the hearing sense, are separated from spectral signals, and encoded in distinction from the remaining spectral components. This enables efficient encoding at a high compression ratio without substantially producing acoustic deterioration of audio signals.
If the above-mentioned DFT or DCT is utilized for transforming waveform signals into spectral signals, transform with time blocks consisting of M samples gives M independent real-number data. For diminishing connection distortion between neighboring time blocks, a given time block is usually overlapped by M1 samples with both neighboring time blocks. Thus, with DFT or DCT, M real-number data on an average are quantized and encoded for (M-M1) samples.
If the above-mentioned MDCT is employed for transform into spectral signals, since M samples are overlapped with both neighboring time blocks, independent M real-number data are obtained from 2M samples. Thus, on an average, 2M real-number data are quantized and encoded with MDCT for M samples. The decoder adds waveform signals, obtained on inverse transform in the respective blocks from the codes resulting from MDCT, with overlap between neighboring waveform elements, for reconstructing waveform signals.
In general, if the time block for transform is elongated, the frequency resolution of the spectrum is increased, such that the energy concentration occurs in specified spectral components. Therefore, if MDCT, in which transform is executed with a longer block length with one-half overlap with both neighboring blocks and the number of resulting spectral components is not increased with the number of original time samples, is used, the encoding may be achieved with an efficiency higher than if DFT or DCT is employed. In addition, by providing sufficiently long overlap with neighboring blocks, it becomes possible to reduce the block-to-block distortion of the waveform signals.
However, in the case of the transform in which waveform signals are constructed with overlap with both neighboring waveform elements at the time of inverse transform, such as MDCT, certain conditions need to be met by orthogonal transform and inverse orthogonal transform. If these conditions are not met, correct time-domain signals cannot be produced on inverse transform of spectral signals.
In addition to the above-mentioned constraint, the forward transform window function and the inverse transform window functions have hitherto been designed so as to have the same shape. Consequently, the window function for forward orthogonal transform which is not sufficiently smooth in shape has been used, so that the spectral signals obtained on orthogonal transform are lower in concentration in energy distribution. The result is that a large number of spectral components need to be encoded with high precision, while it is difficult to achieve efficient encoding. In particular, if tonal components are separated for encoding, it is desirable that the number of the spectral components which should be separated for constituting tonal components be as small as possible for achieving efficient encoding. However, since a sufficient frequency separation cannot be achieved with the conventional window function for forward transform, the number of spectral components that make up the respective tonal components is increased and hence efficient encoding cannot be achieved.