1. Field of the Invention
This invention relates to an information encoding method and apparatus, suitable for expanding the format of the encoded signals, an information decoding method and apparatus, as counterparts of the information encoding method and apparatus, and an information recording medium having the encoded information recorded thereon.
2. Description of the Related Art
There has so far been proposed an information recording medium capable of recording signals such as encoded acoustic information or music information (referred to hereinafter as audio signals), such as a magneto-optical disc. Among methods for high-efficiency encoding of the audio signals, there are a so-called transform coding which is a blocking frequency spectrum splitting method of transforming a time-domain signal into frequency domain signals by orthogonal transform and encoding the spectral components from one frequency band to another, and a sub-band encoding (SBC) method, which is a non-blocking frequency spectrum splitting method of splitting the time-domain audio signals into plural frequency bands without blocking and encoding the resulting signals of the frequency bands. There is also known a high-efficiency encoding technique which is a combination of the sub-band coding and transform coding, in which case the time domain signals are split into plural frequency bands by SBC and the resulting band signals are orthogonally transformed into spectral components which are encoded from band to band.
Among the above-mentioned filters is a so-called QMF filter as discussed in 1976, R. E. Crochiere, Digital Coding of Speech in subbands, Bell Syst. Tech. J. Vol. 55, No. 8, 1976. This QMF filter splits the frequency spectrum into two bands of equal bandwidths and is characterized in that so-called aliasing is not produced on subsequently synthesizing the split bands. The technique of dividing the frequency spectrum is discussed in Joseph H. Rothweiler, Polyphase Quadrature Filters--A New Subband Coding Technique, ICASSP 83 BOSTON. This polyphase quadrature filter is characterized in that the signal can be split into plural bands of equal band-width.
Among the above-mentioned techniques for orthogonal transform is such a technique in which an input audio signal is blocked every pre-set unit time, such as every frame, and discrete Fourier transform (DFT), discrete cosine transform (DCT) or modified DCT (MDCT) is applied to each block for converting the signals from the time axis to the frequency axis. Discussions of the MDCT are found in J. P. Princen and A. B. Bradley, Subband/Transform Coding Using Filter Bank Designs Based on Time Domain Aliasing Cancellation, ICASSP 1987.
If the above-mentioned DFT or DCT is used as a method for transforming waveform signals into spectral signals, and a transform is applied based on a time block composed of M samples, M independent real-number data are obtained. It is noted that, for reducing junction distortions between time blocks, a given time block is usually overlapped with M1 samples with both neighboring blocks, and M real-number data on an average are quantized and encoded in DFT or DCT for (M-M1) samples. It is these M real-number data that are subsequently quantized and encoded.
On the other hand, if the above-mentioned MDCT is used as a method for orthogonal transform, M independent real-number data are obtained from 2M samples overlapped with N samples of both neighboring time blocks. Thus, in MDCT, M real-number data on an average are obtained for M samples and subsequently quantized and encoded. A decoding device adds waveform elements obtained on inverse transform in each block from the codes obtained by MDCT with interference for re-constructing the waveform signals.
In general, if a time block for a transform is lengthened, the spectrum frequency resolution is improved such that the signal energy is concentrated in specified frequency components. Therefore, by using MDCT in which, by overlapping with one half of each of both neighboring blocks, transform is carried out with long block lengths, and in which the number of the resulting spectral signals is not increased beyond the number of the original time samples, encoding can be carried out with higher efficiency than if the DFT or DCT is used. Moreover, since the neighboring blocks have a sufficiently long overlap with each other, the inter-block distortion of the waveform signals can be reduced. However, if the transform block length for a transform is lengthened, more work area is required for the transform, thus making a reduction in size of a reproducing means more difficult.
By quantizing signals split into plural frequency bands by a filter or orthogonal transform, the frequency band in which the quantization noise occurs can be controlled so that encoding can be achieved with higher psychoacoustic efficiency by exploiting acoustic characteristics such as masking effects. If the signal components are normalized with the maximum values of the absolute values of the signal components in the respective bands, encoding can be achieved with still higher efficiency.
As frequency band widths in case of quantizing the frequency components, obtained on splitting the frequency spectrum, it is known to split the frequency spectrum such as to take account of the psychoacoustic characteristics of the human auditory system. Specifically, the audio signals are divided into a plurality of, such as 25, bands using bandwidths increasing with increasing frequency. These bands are known as critical bands. In encoding the band-based data, encoding is carried out by fixed or adaptive bit allocation on the band basis. In encoding coefficient data obtained by MDCT processing by bit allocation as described above, encoding is by an adaptive number of bit allocation for band-based MDCT coefficients obtained by block-based MDCT processing. Among the prior art bit allocation techniques, there are known the following two techniques.
For example, in R. Zelinsky and P. Noll, Adaptive Transform Coding of Speech Signals and in "IEEE Transactions of Acoustics, Speech and Signal Processing", vol. ASSP-25, No. 4, August 1977, bit allocation is performed on the basis of the magnitude of the band-based signals. With this system, the quantization noise spectrum becomes flat, such that the quantization noise is minimized. However, the actual noise feeling is not psychoacoustically optimum because the psychoacoustic masking effect is not exploited.
In a publication "ICASSP 1980, The Critical Band Coder-Digital Encoding of the Perceptual Requirements of the Auditory System", M. A. Krasner, MIT, the psychoacoustic masking mechanism is used to determine a fixed bit allocation that produces the necessary signal-to-noise ratio for each critical band. However, if this technique is used to measure characteristics of a sine wave input, non-optimum results are obtained because of the fixed allocation of bits among the critical bands.
For overcoming these problems, there is proposed a high-efficiency encoding device in which a portion of the total number of bits usable for bit allocation is used for a fixed bit allocation pattern pre-fixed from one small block to another and the remaining portion is used for bit allocation dependent on the signal amplitudes of the respective blocks, and in which the bit number division ratio between the fixed bit allocation and the bit allocation dependent on the signal amplitudes is made dependent on a signal related to an input signal, such that the bit number division ratio to the fixed bit allocation becomes larger if the signal spectrum is smoother.
This technique significantly improves the signal-to-noise ratio on the whole by allocating more bits to a block including a particular signal spectrum exhibiting concentrated signal energy. By using the above techniques, for improving the signal-to-noise ratio characteristics, not only are the measured values increased, but also the sound as perceived by the listener is improved in signal quality, because the human auditory system is sensitive to signals having acute spectral components.
A variety of different bit allocation techniques have been proposed, and a model simulating the human auditory mechanism has also become more elaborate, such that perceptually higher encoding efficiency can be achieved supposing that the encoding device capability is correspondingly improved.
In these techniques, the customary practice is to find real-number reference values for bit allocation, realizing the signal-to-noise characteristics as found by calculations as faithfully as possible, and to use integer values approximating the reference values as allocated bit numbers.
For constructing a real code string, it suffices if the quantization fineness information and the normalization coefficient information are encoded with pre-set numbers of bits, from one normalization/quantization band to another, and the normalized and quantized spectral signal components are encoded. In the ISO standard (ISO/IEC 11172-3:1993 (E), a993), there is described a high-efficiency encoding system in which the numbers of bits representing the quantization fineness information are set so as to be different from one band to another. Specifically, the number of bits representing quantization fineness information is set so as to be decreased with the increased frequency.
There is also known a method of determining the quantization fineness information in the decoding device from, for example, the normalization coefficient information. Since the relation between the normalization coefficient information and the quantization fineness information is set at the time of setting the standard, it becomes impossible to introduce the quantization fineness control based on a more advanced psychoacoustic model in future. In addition, if there is a width in the compression ratio to be realized, it becomes necessary to set the relation between the normalization coefficient information and the quantization fineness information from one compression ratio to another.
There is also known a method of using variable length codes for encoding for realization of more efficient encoding of quantized spectral signal components, as described in D. A. Huffman, A Method for Construction of Minimum Redundancy Codes, in Proc. I.R.E., 40, p. 1098 (1952).
In Japanese Patent Application No. 7-500482 of the present Assignee, there is, disclosed a method of separating perceptually critical tonal components, that is signal components having the signal energy concentrated in the vicinity of a specified frequency, from the spectral signals, and encoding the signal components separately from the remaining spectral components. This enables audio signals to be efficiently encoded with a high compression ration without substantially deteriorating the psychoacoustic sound quality.
The above-described encoding techniques can be applied to respective channels of acoustic signals constructed by plural channels. For example, the encoding techniques can be applied to each of the left channel associated with a left-side speaker and the right channel associated with a right-side speaker. The encoding techniques can also be applied to the (L+R)/2 signal obtained on summing the L-channel and R-channel signals together. The above-mentioned techniques may also be applied to (L+R)/2 and (L-R)/2 signals for realizing efficient encoding. Meanwhile, the amount of data for encoding one-channel signals equal to one-half the data volume required for independently encoding the two-channel signals suffices. Thus, such a method of recording signals on a recording medium is frequently used in which a mode for recording as one-channel monaural signals and a mode for recording as two-channel stereo signals are readied and recording can be made as monaural signals if it is required to make longtime recording.
Meanwhile, the techniques of improving the encoding efficiency are currently developed and introduced one after another, such that, if a standard including a newly developed proper encoding technique is used, it becomes possible to make a longer recording or to effect recording of audio signals of higher sound quality for the same recording time.
In setting the above-described standard, an allowance is left for recording the flag information concerning the standard on the information recording medium in consideration that the standard may be modified or expanded in future. For example, `0` or `1` are recorded as a 1-bit flag information when initially setting or modifying the standard, respectively. The reproducing device complying with the as-modified standard checks if the flag information is `0` or `1` and, if this flag infonnation is `1`, the signal is read out and reproduced from the information recording medium in accordance with the as-modified standard. If the flag information is `0`, and the reproducing device is also compatible with the initially set standard, the signal is read out and reproduced from the information recording medium on the basis of the standard. If the reproducing device is not compatable with the initially set standard, the signal is not reproduced.
However, if a reproducing device capable of reproducing only the signals recorded by the standard once set (`old standard` or `first encoding method`) is in widespread use, it is not possible with this reproducing device designed compatible with the old standard to reproduce an information recording medium recorded using an upper standard (`new standard` or `second encoding method`) which exploits an encoding system of higher efficiency, thus embarrassing the user of the device. The reproducing device capable of reproducing only the signals recorded by the standard once set is hereinafter termed a reproducing device designed compatible with the old standard.
In particular, some of the reproducing devices at the time the old standard was set (reproducing devices designed to be compatible with the old standard) disregard the flag information recorded on the information recording medium and reproduce the signals on the assumption that the signals recorded on the recording medium are all encoded in accordance with the old standard. That is, if the information recording medium has been recorded in accordance with the new standard, it is not all reproducing devices designed compatible with the old standard that can recognize that the information recording medium has been recorded in this manner. Thus, if the reproducing device designed compatible with the old standard reproduces the information recording medium, having recorded thereon signals compatible with the new standard, on the assumption that the recording medium has recorded thereon the signals compatible with the optical disc standard, there is a fear that the device cannot operate normally or an objectionable noise may be produced.
On the other hand, if signals of different standards, for example, signals of the old standard or those of the new standard, are recorded simultaneously on the same recording medium, the recording areas allocated to these two signals are correspondingly decreased to render it difficult to maintain the quality of the recorded or reproduced signals.