1. Field of the Invention
This invention relates to an encoding method and apparatus, suitable for encoding input signals by high efficiency encoding and for reproducing playback signals on transmission, recording, reproduction and decoding, and a recording medium.
2. Description of the Related Art
There has so far been proposed an information recording medium capable of recording signals such as the encoded acoustic information or the music information (referred to hereinafter as audio signals), such as a magneto-optical disc. Among methods for high-efficiency encoding of the audio signals, there are a so-called transform coding which is a blocking frequency spectrum splitting method of transforming a time-domain signal into frequency domain signals by orthogonal transform and encoding the spectral components from one frequency band to another, and a sub-band encoding (SBC) method, which is a non-blocking frequency spectrum splitting method of splitting the time-domain audio signals into plural frequency bands without blocking and encoding the resulting signals of the frequency bands. There is also known a high-efficiency encoding technique which is a combination of the sub-band coding and transform coding, in which case the time domain signals are split into plural frequency bands by SBC and the resulting band signals are orthogonal transformed into spectral components which are encoded from band to band.
Among the above-mentioned filters is a so-called QMF (Quadrature Mirror Filter) as discussed in R.E. Crochiere, Digital Coding of Speech in subbands, Bell Syst. Tech. J. Vol.55, No.8, 1976. This QMF filter splits the frequency spectrum into two bands of equal bandwidths and is characterized in that so-called aliasing is not produced on subsequently synthesizing the split bands. The technique of dividing the frequency spectrum is discussed in Joseph H. Rothweiler, Polyphase Quadrature Filters- A New Subband Coding Technique, ICASSP 83 BOSTON. This polyphase quadrature filter is characterized in that the signal can be split at a time into plural bands of equal band-width.
Among the above-mentioned techniques for orthogonal transform is such a technique in which an input audio signal is blocked every pre-set unit time, such as every frame, and discrete fourier transform (DFT), discrete cosine transform (DCT) or modified DCT (MDCT) is applied to each block for converting the signals from the time axis to the frequency axis. Discussions of the MDCT are found in J. P. Princen and A. B. Bradley, Subband/Transform coding Using Filter Bank Based on Time Domain Aliasing Cancellation, ICASSP 1987.
If the above-mentioned DFT or DCT is used as a method for transforming waveform signals into spectral signals, and transform is applied based on a time block composed of M samples, M independent real-number data are obtained. It is noted that, for reducing junction distortions between time blocks, a given time bock is usually overlapped with MI samples with both neighboring blocks, and M real-number data on an average are quantized and encoded in DFT or DCT for (Mxe2x88x92M1) samples. It is these M real-number data that are subsequently quantized and encoded.
On the other hand, if the above-mentioned MDCT is used as a method for orthogonal transform, M independent real-number data are obtained from 2M samples overlapped with M samples of both neighboring time blocks. Thus, in MDCT, M real-number data on an average are obtained for M samples and subsequently quantized and encoded. A decoding device adds waveform elements obtained on inverse transform in each block from the codes obtained by MDCT with interference for re-constructing the waveform signals.
In general, if a time block for transform is lengthened, the spectrum frequency resolution is improved such that the signal energy is concentrated in specified frequency components. Therefore, by using MDCT in which, by overlapping with one half of each of both neighboring blocks, transform is carried out with long block lengths, and in which the number of the resulting spectral signals is not increased beyond the number of the original time samples, encoding can be carried out with higher efficiency than if DFT or DCT is used. Moreover, since the neighboring blocks have sufficiently long overlap with each other, the inter-block distortion of the waveform signals can be reduced. However, if the transform block length for transform is lengthened, more work area is required for transform, thus obstructing reduction in size of reproducing means. In particular, use of a long transform block at a time point when it is difficult to raise the integration degree of a semiconductor should be avoided since this increases the manufacturing cost.
By quantizing signals split into plural frequency bands by a filter or orthogonal transform, the frequency band in which occurs the quantization noise can be controlled so that encoding can be achieved with psychoacoustic higher efficiency by using acoustic characteristics such as masking effects. If the signal components are normalized with the maximum values of the absolute values of the signal components in the respective bands, encoding can be achieved with still higher efficiency.
As frequency band widths in case of quantizing the frequency components, obtained on splitting the frequency spectrum, it is known to split the frequency spectrum such as to take account of the psychoacoustic characteristics of the human auditory system. Specifically, the audio signals are divided into a plurality of, such as 25, bands using bandwidths increasing with increasing frequency. These bands are known as critical bands. In encoding the band-based data, encoding is carried out by fixed or adaptive bit allocation on the band basis. In encoding coefficient data obtained by MDCT processing by bit allocation as described above, encoding is by an adaptive number of bit allocation for band-based MDCT coefficients obtained by block-based MDCT processing. As these bit allocation techniques, there are known the following two techniques.
For example, in R. Zelinsky and P. Noll, Adaptive Transform Coding of Speech Signals and in xe2x80x98IEEE Transactions of Acoustics, Speech and Signal Processing, vol. ASSP-25, No.4, August 1977, bit allocation is performed on the basis of the magnitude of the band-based signals. With this system, the quantization noise spectrum becomes flat, such that the quantization noise is minimized. However, the actual noise feeling is not psychoacoustically optimum because the psychoacoustic masking effect is not exploited.
In a publication xe2x80x98ICASSP 1980, The critical band coderxe2x80x94digital encoding of the perceptual requirements of the auditory system, M. A. Krasner, MITxe2x80x99, the psychoacoustic masking mechanism is used to determine a fixed bit allocation that produces the necessary signal-to-noise ratio for each critical band. However, if this technique is used to measure characteristics of a sine wave input, non-optimum results are obtained because of the fixed allocation of bits among the critical bands.
For overcoming these problems, there is proposed a high-efficiency encoding device in which a portion of the total number of bits usable for bit allocation is used for a fixed bit allocation pattern pre-fixed from one small block to another and the remaining portion is used for bit allocation dependent on the signal amplitudes of the respective blocks, and in which the bit number division ratio between the fixed bit allocation and the bit allocation dependent on the signal amplitudes is made dependent on a signal related to an input signal, such that the bit number division ratio to the fixed bit allocation becomes larger the smoother the signal spectrum.
This technique significantly improves the signal-to-noise ratio on the whole by allocating more bits to a block including a particular signal spectrum exhibiting concentrated signal energy. By using the above techniques, for improving the signal-to-noise ratio characteristics, not only the measured values are increased, but also the sound as perceived by the listener is improved in signal quality, because the human auditory system is sensitive to signals having acute spectral components.
A variety of different bit allocation techniques have been proposed, and a model simulating the human auditory mechanism has also become more elaborate, such that perceptually higher encoding efficiency can be achieved supposing that the encoding device capability is correspondingly improved.
In these techniques, the customary practice is to find real-number reference values for bit allocation, realizing the signal-to-noise characteristics as found by calculations as faithfully as possible, and to use integer values approximating the reference values as allocated bit numbers.
For constructing a real codestring, it suffices if the quantization fineness information and the normalization coefficient information are encoded with pre-set numbers of bits, from one normalization/quantization band to another, and the normalized and quantized spectral signal components are encoded. In the ISO standard (ISO/IEC 11172-3:1993 (E), 1993), there is described a high-efficiency encoding system in which the numbers of bits representing the quantization fineness information are set so as to be different from one band to another. Specifically, the number of bits representing the quantization fineness information is set so as to be decreased with the increased frequency.
There is also known a method of determining the quantization fineness information in the decoding device from, for example, the normalization coefficient information. Since the relation between the normalization coefficient information and the quantization fineness information is set at the time of setting the standard, it becomes impossible to introduce the quantization fineness control based on a more advanced psychoacoustic model in future. In addition, if there is a width in the compression ratio to be realized, it becomes necessary to set the relation between the normalization coefficient information and the quantization fineness information from one compression ratio to another.
The above-described encoding techniques can be applied to respective channels of acoustic signals constructed by plural channels. For example, the encoding techniques can be applied to each of the left channel associated with a left-side speaker and the right channel associated with a right-side speaker. The encoding techniques can also be applied to the (L+R)/2 signal obtained on summing the L-channel and R-channel signals together. The above-mentioned techniques may also be applied to (L+R)/2 and (Lxe2x88x92R)/2 signals for realizing efficient encoding. Meanwhile, the amount of data for encoding one-channel signals equal to one-half the data volume required for independently encoding the two-channel signals suffices. Thus, such a method of recording signals on a recording medium is frequently used in which a mode for recording as one-channel monaural signals and a mode for recording as two-channel stereo signals are readied and recording can be made as monaural signals if it is required to make long-time recording.
There is also known a method of using variable length codes for encoding for realization of more efficient encoding of quantized spectral signal components, as described in D. A. Huffman, xe2x80x9cA Method for Construction of Minimum Redundancy Codesxe2x80x9d, in Proc. I.R.E., 40, p. 1098 (1952).
In International Publication WO94/28633 of the present Assignee, there is disclosed a method of separating perceptually critical tonal components, that is signal components having the signal energy concentrated in the vicinity of a specified frequency, from the spectral signals, and encoding the signal components separately from the remaining spectral components. This enables audio signals to be efficiently encoded with a high compression ration without substantially deteriorating the psychoacoustic sound quality.
Meanwhile, the techniques of improving the encoding efficiency are currently developed and introduced one after another, such that, if a standard including a newly developed proper encoding technique is used, it becomes possible to make longer recording or to effect recording of audio signals of higher sound quality for the same recording time.
In setting the above-described standard, an allowance is left for recording the flag information concerning the standard on the information recording medium in consideration that the standard may be modified or expanded in future. For example, xe2x80x980xe2x80x99 or xe2x80x981xe2x80x99 are recorded as a 1-bit flag information when initially setting or modifying the standard, respectively. The reproducing device complying with the as-modified standard checks if the flag information is xe2x80x980xe2x80x99 or xe2x80x981xe2x80x99 and, if this flag information is xe2x80x981xe2x80x99, the signal is read out and reproduced from the information recording medium in accordance with the as-modified standard. If the flag information is xe2x80x980xe2x80x99, and the reproducing device is also in meeting with the initially set standard, the signal is read out and reproduced from the information recording medium on the basis of the standard. If the reproducing device is not in meeting with the initially set standard, the signal is not reproduced.
The present Assignee has proposed in Japanese Patent Application No. H-9-42514 an encoding method for encoding multi-channel signals in terms of a frame the size of which cannot be controlled by an encoder. In this technique, signals of a channel to be encoded in accordance with a standard once set (referred to hereinafter as an xe2x80x9cold standardxe2x80x9d) are encoded with a number of bits smaller than the maximum number of bits that can be allocated for a given frame and encoded signals of other channels are arranged in a vacant area in the frame so generated to enable reproduction of signals of a minor number of channels with a reproducing reproducing device associated with the old standard (referred to hereinafter as an old standard accommodating reproducing device), while signals of a larger number of channels can be reproduced by employing a reproducing device (referred to hereinafter as a new standard accommodating reproducing device) associated with the new standard (referred to hereinafter as new standard).
By this method, the encoding method for signals of channels not reproduced by the old standard accommodating reproducing device is made higher in the encoding efficiency than the old standard encoding method to reduce deterioration in sound quality otherwise caused by encoding multi-channel signals. By recording A=(L+R)/2 signal in an area reproducible by the old standard accommodating reproducing device and B=(Lxe2x88x92R)/2 signals in an area not reproducible by the old standard accommodating reproducing device, in accordance with this method, the old standard accommodating reproducing device can reproduce monaural signals, while the new standard accommodating reproducing device can reproduce stereo signals L and R from channels A and B.
The method for encoding (L+R)/2 and (Lxe2x88x92R)/2 signals and reproducing the encoded stereo signals is described in, for example, James D. Johnston, xe2x80x9cPerceptual Transform Coding of Wideband Stereo Signalsxe2x80x9d, ICASSP89, pp. 1993-1995).
The present Assignee has also proposed in Japanese Patent Application No.H-9-92448 a technique in which signals of an area not reproduced by the old standard accommodating reproducing device are selected from (Lxe2x88x92R)/2, L and R for reducing the effect of the quantization error which presents itself when encoding the signals having a significant level difference between left and right channels.
Meanwhile, if it is desired to attempt standard expansion using a signal decoding method which enables a larger number of channel signals to be reproduced by standard expansion while enabling a smaller number of channels to be reproduced by the old standard accommodating reproducing device for standard expansion for reproducing stereo signals, there are occasions wherein the quantization noise produced on encoding presents problems depending on the sorts of stereo signals.
Referring to FIGS. 1 and 2, the manner of generation of the quantization noise is explained.
FIGS. 1A and 1B show frequency spectral components of left channel (L) and right channel (R) components of typical stereo signals.
FIGS. 1C and 1D illustrate frequency spectrum waveforms of signals obtained on converting the L and R signals into signals corresponding to (L+R)/2 and (Lxe2x88x92R)/2 by channel conversion. Since in general the respective channels of stereo signals exhibit strong correlation, the channel of B=(Lxe2x88x92R)/2 is significantly smaller in signal component level than L or R channel.
FIGS. 1E and 1F show the state of the quantization noise generated on encoding and subsequently decoding signals of A and B channels by the high efficiency encoding method. N1 and N2 denote the frequency components of the quantization noise generated on encoding the A and B channels. The signal obtained on encoding and decoding the channel A and that obtained on encoding and decoding the channel B are termed (A+Ni) and (B+N2), respectively. In the high efficiency encoding method, it is a frequent occurrence that the quantization noise level depends on the level of the original signal component. In such case, the level of the quantization noise N2 is significantly lower than the quantization noise N2.
FIGS. 1G and 1H denote the manner in which the respective channels of the stereo signals have been separated from the (A+N1) and (B+N2) signal components. By adding (A+N1) to (B+N2), the R component is canceled, while only the L-component can be retrieved. Similarly, by subtracting (B+N2) from (A+N1), the L-component is canceled, while only the R component can be retrieved.
The quantization noises N1 and N2 are left as (N1+N2) or (N1xe2x88x92N2). Since N2 is significantly low in level as compared to N1, neither (N1+N2) nor (N1xe2x88x92N2) raises psychoacoustic problems.
FIG. 2 shows the state of the quantization noise produced on encoding, decoding and reproducing stereo signals having no correlation between left and right channels.
FIGS. 2A and 2B show the frequency spectral waveforms of left channel (L) components and right channel (R) components having no correlation between left and right channels.
FIGS. 2C and 2D show the spectral signal waveforms of signals obtained on channel-converting the L and R signals into signals equivalent to (L+R)/2 and (Lxe2x88x92R)/2 signals. As in the example of FIG. 1, (L+R)/2 and (Lxe2x88x92R)/2 channels are termed A and B channels, respectively. Since L and R exhibit no correlation, the signal B=(Lxe2x88x92R)/2 is not lowered in signal level.
FIGS. 2E and 2F show the state of the quantization noise produced on encoding the signals of the channels A and B by the above-described high efficiency encoding method and decoding the encoded signals. N1 and N2 denote tine-axis waveforms of the quantization noise components produced in encoding the signals of the A and B channels, respectively. As in the example of FIG. 1, signals obtained on encoding and decoding the A and B channels are termed (A+N1) and (B+2), respectively.
FIGS. 2G and 2H show the state in which respective channels of the stereo signals are separated from the signal waveforms (A+N1) and (B+N2). Addition of (A+N1) and (B+N2) cancels out the L component to make it possible to retrieve only the R component.
However, since the high-range side components of (N1+N2) and low-range side components of (N1xe2x88x92N2) are not masked by the original signals, these quantization noises give negative psychoacoustic effects.
In the stereo signals, since the signal levels or energies of both channels are substantially unchanged, it is similarly difficult to select a channel for encoding such as to minimize the quantization noise depending on the signal level or energy.
It is an object of the present invention to provide an encoding method and apparatus which makes it possible to reduce the effect of the quantization noise otherwise produced after decoding on the occasion of encoding and decoding which realizes multiple channels by new standard expansion while enabling reproduction by the old standard accommodating reproducing device.
In one aspect, the present invention provides an encoding method including computing mixing coefficients of a plurality of channel signals, mixing the channel signals based on the mixing coefficients, generating plural processing signals corresponding to the channel signals from the mixed channel signals and encoding the processing signals.
In another aspect, the present invention provides an encoding method including computing mixing coefficients of a plurality of channel signals, generating plural processing signals corresponding to the channel signals from the channel signals, multiplying the processing signals with coefficients derived from the mixing coefficients and encoding the processing signals multiplied with the coefficients.
In a further aspect, the present invention provides an encoding apparatus including means for computing mixing coefficients of a plurality of channel signals, means for mixing the channel signals based on the mixing coefficients, means for generating plural processing signals corresponding to the channel signals from the mixed channel signals and means encoding the processing signals.
In a further aspect, the present invention provides an encoding apparatus including means for computing mixing coefficients of a plurality of channel signals, means for generating plural processing signals corresponding to the channel signals from the channel signals, means for multiplying the processing signals with coefficients derived from the mixing coefficients and means for encoding the processing signals multiplied with the coefficients.
In a further aspect, the present invention provides a recording medium having recorded thereon encoded signals, wherein the recorded signals include codestrings generated on computing mixing coefficients of a plurality of channel signals, mixing the channel signals based on the mixing coefficients, generating plural processing signals corresponding to the channel signals from the mixed channel signals and on encoding the processing signals.
In yet another aspect, the present invention provides a recording medium having recorded thereon encoded signals, wherein the recorded signals include codestrings generated on computing mixing coefficients of a plurality of channel signals, generating plural processing signals corresponding to the channel signals from the channel signals, multiplying the processing signals with coefficients derived from the mixing coefficients and encoding the processing signals multiplied with the coefficients.
Thus, the present invention provides an information recording apparatus which, while enabling reproduction by an old standard accommodating reproducing device, reduces the effect of the quantization error produced on decoding at the time of encoding and decoding which realizes multiple channels by new standard expansion, by mixing input signals constituted by plural channels in the mixing ratio as set depending on the inter-channel correlation.
That is, in the method for enabling multi-channel reproduction for prolonged time with a new standard accommodating recording device, while enabling reproduction by an old standard accommodating reproducing device, the present invention enables signal reproduction in a manner such that the effect of lowering of the sound quality by using multiple channels can be suppressed to minimum.