This invention relates to coding method and device suitable for expanding the format of coded signals, decoding method and device corresponding thereto, and a recording medium on which coded signals are recorded.
Conventionally, a signal recording medium such as a magneto-optical disc has been proposed as a medium on which signals like coded acoustic information or audio information (hereinafter referred to as audio signals) can be recorded. There are various methods for high-efficiency coding of the audio signals, which can be exemplified by, for example, so-called transform coding, which is a blocking frequency band division system for blocking audio signals on the time base by a predetermined time unit, then transforming (spectrum transform) the signals on the time base of each block to signals on the frequency base so as to divide the signal into a plurality of frequency bands, and coding the signal of each band, or so-called subband coding (SBC), which is a non-blocking frequency band division system for dividing audio signals on the time base into a plurality of frequency bands without blocking the signals, and then coding the signals. Also, a method for high-efficiency coding using the above-described subband coding and transform coding in combination is considered. In this case, for example, after band division is carried out in the subband coding, the signal of each band is spectrum-transformed to a signal on the frequency base, and this spectrum-transformed signal of each band is coded.
As a filter for band division used in the above-described subband coding, a filter such as a so-called QMF (quadrature mirror filter) is employed. This QMF filter is described in R. E. Crochiere, xe2x80x9cDigital coding of speech in subbands,xe2x80x9d Bell Syst. Tech. J., Vol.55, No.8, 1976. This QMF filter is adapted for bisecting a band with equal band widths, and is characterized in that so-called aliasing is not generated in synthesizing the divided bands. Also, in Joseph H. Rothweiler, xe2x80x9cPolyphase Quadrature filtersxe2x80x94A new subband coding technique,xe2x80x9d ICASSP 83, BOSTON, a filter division method for equal band widths is described. This polyphase quadrature filter is characterized in that it can divide, at a time, a signal into a plurality of bands of equal band widths.
As the above-described spectrum transform, for example, input audio signals are blocked by a predetermined unit time (frame), and discrete Fourier transform (DFT), discrete cosine transform (DCT), modified discrete cosine transform (MDCT) or the like is carried out for each block, thereby transforming the time base to the frequency base. The above-mentioned MDCT is described in J. P. Princen, A. B. Bradley, xe2x80x9cSubband/Transform Coding Using Filter Bank Designs Based on Time Domain Aliasing Cancellation,xe2x80x9d Univ. of Surrey Royal Melbourne Inst. of Tech., ICASSP 1987.
In the case where the above-mentioned DFT or DCT is used as a method for spectrum transform of a waveform signal, M units of independent real-number data are obtained by carrying out transform using a time block constituted by M units of sample data. (This block is hereinafter referred to as a transform block.) To reduce connection distortion between transform blocks, normally, M1 units of sample data of each of adjacent transform blocks are caused to overlap each other. Therefore, in DFT or DCT, M units of real-number data are obtained with respect to (M-M1) units of sample data on the average, and these M units of real-number data are subsequently quantized and coded.
On the other hand, in the case where the above-mentioned MDCT is used as a method for spectrum transform, M units of real-number data are obtained from 2M units of sample data which are obtained by causing M units of sample data of each of adjacent transform blocks to overlap each other. That is, in the case where MDCT is used, M units of real-number data are obtained with respect to M units of sample data on the average, and these M units of real-number data are subsequently quantized and coded. In a decoding device, waveform elements obtained by carrying out inverse transform of each block, from the code obtained by using MDCT, are added to each other while being caused to interfere with each other, thereby reconstituting a waveform signal.
Meanwhile, in general, if the transform block for spectrum transform is made long, the frequency resolution is enhanced and energy is concentrated on a specified spectral signal component. Therefore, by carrying out spectrum transform using a long transform block length obtained by causing sample data of adjacent transform blocks to overlap each other by half thereof each, and using MDCT such that the number of obtained spectral signal components is not increased with respect to the number of original sample data on the time base, more efficient coding can be carried out than in the case where DFT or DCT is used. Also, by providing a sufficiently long overlap of the adjacent transform blocks, connection distortion between the transform blocks of the waveform signals can be reduced. However, since a longer transform block for transform requires a greater work area for transform, it becomes an obstacle to miniaturization of reproducing means or the like. Particularly, employment of a long transform block at the time when increase in integration degree of a semiconductor is difficult leads to increase in cost, and therefore needs to be considered carefully.
As described above, by quantizing a signal component divided for each band by using a filter or spectrum transform, a band where quantization noise is generated can be controlled. Therefore, utilizing characteristics of a so-called masking effect, auditorily more efficient coding can be carried out. In addition, by normalizing each sample data using the maximum value of the absolute value of the signal component in each band before carrying out quantization, more efficient coding can be carried out.
As the frequency division width in the case where each signal obtained by carrying out frequency band division of audio signals is to be quantized, a band width in consideration of human auditory characteristics may be preferably used. Specifically, it is preferred to divide audio signals into a plurality of bands (for example, 25 bands) by using a band width referred to as a critical band that generally becomes greater in higher frequency bands. In coding data of each band in this case, coding based on predetermined bit distribution for each band or adaptive bit allocation for each band is carried out. For example, in coding coefficient data obtained by MDCT processing by using the above-mentioned bit allocation, coding with an adaptive number of allocated bits is carried out with respect to MDCT coefficient data of each band obtained by MDCT processing for each transform block. As bit allocation methods, the following two method are known.
For example, in R. Zelinski and P. Noll, xe2x80x9cAdaptive Transform Coding of Speech Signals,xe2x80x9d IEEE Transactions of Acoustics, Speech, and Signal Processing, vol. ASSP-25, No.4, August 1977, bit allocation is carried out on the basis of the magnitude of the signal of each band. In this method, the quantization noise spectrum becomes flat and the noise energy becomes minimum. However, since the masking effect is not utilized, the actual feeling of noise is not auditorily optimum.
In addition, in M. A. Kransner, xe2x80x9cThe critical band coderxe2x80x94digital encoding of the perceptual requirements of the auditory system,xe2x80x9d MIT, ICASSP 1980, a method for carrying out fixed bit allocation by utilizing auditory masking to obtain a necessary signal-to-noise ratio for each band is described. With this method, however, even in measuring characteristics by sine wave input, the resultant characteristic value is not so satisfactory because bit allocation is fixed.
To solve these problems, there has been proposed a high-efficiency coding method in which all bits that can be used for bit allocation are used in a divided manner for a fixed bit allocation pattern predetermined for each small block and for bit allocation dependent on the magnitude of the signal of each block, with the division ratio being dependent on a signal related to the input signal, so that the division rate for the fixed bit allocation pattern is made greater as the signal spectrum pattern becomes smoother.
According to this method, like sine wave input, in the case where energy is concentrated on a specified spectral signal component, a large number of bits are allocated to the block including that spectral signal component, thereby enabling significant improvement in overall signal-to-noise characteristic. In general, the human auditory sense is extremely sensitive to a signal having an acute spectral signal component. Therefore, improvement in signal-to-noise characteristic by using such method not only improves the numerical value in measurement but also is effective for improving the sound quality in consideration of the auditory sense.
There have been proposed various other bit allocation methods. Thus, if the model related to the auditory sense is made more precise and if the capability of the coding device is improved, auditorily more efficient coding can be carried out.
In these methods, it is typical to find a bit allocation reference value of a real number such as to precisely realize the signal-to-noise characteristic found by calculation, and use an integral value approximate thereto as the number of allocated bits.
In constituting an actual code string, it suffices to first encode quantization precision information and normalization coefficient information by a predetermined number of bits for each band to be normalized and quantized, and then encode spectral signal components that are normalized and quantized. In accordance with the ISO standard (ISO/IEC 11172-3:1993(E), 1993), there is described a high-efficiency coding system in which the number of bits expressing quantization precision information is set to vary depending on the band. In this case, such standard is set that the number of bits expressing quantization precision information becomes smaller in higher frequency bands.
Also, a method for determining quantization precision information from normalization coefficient information in the decoding device, instead of directly coding the quantization precision information, is known. In this method, since the relation between the normalization coefficient information and the quantization precision information is determined at the time when the standard is set, control with quantization precision based on an auditory model of higher grade cannot be introduced in the future. Also, in the case where the compression rate to be realized varies within a certain range, it is necessary to determine the relation between the normalization coefficient information and the quantization precision information for each compression rate.
In addition, a method for efficiently coding quantized spectral signal components by using variable length coding is known, as described in D. A. Huffman, xe2x80x9cA Method for Construction of Minimum Redundancy Codes,xe2x80x9d Proc. I. R. E., 40, p.1098 (1952).
Moreover, in the specification and drawings of the PCT International Application Publication WO94/28633 by the present Assignee, a method for separating an auditorily important tone component from spectral signal components and coding the tone component separately from the other spectral signal components is proposed. Thus, it is possible to efficiently encode audio signals at a high compression rate without causing substantial deterioration of the audio signals in terms of auditory sense.
Each of the above-described coding methods can be applied to each channel of acoustic signals constituted by a plurality of channels. For example, it may be applied to an L channel corresponding to a left speaker and an R channel corresponding to a right speaker. It can also be applied to signals of (L+R/2) obtained by adding signals of the L channel and R channel. Alternatively, by using each of the above-described methods with respect to a signal of (L+R/2) and a signal of (Lxe2x88x92R/2), efficient coding can be carried out. The quantity of data in coding signals of one channel is half the quantity of data in coding signals of two channels independently. Therefore, in recording signals onto a recording medium, there is often employed a technique of providing two mode, that is, a mode for recording monaural signals of one channel and a mode for recording stereo signals of two channels, and setting such standard that recording for a long period of time can be carried out with monaural signals.
As described above, methods for improving coding efficiency have been developed one after another. Therefore, by employing a standard including a newly developed coding method, recording for a longer period of time can be carried out, or audio signals of higher sound quality of the same recording time can be carried out.
In determining the standard as described above, there is often employed a technique of leaving in a signal recording medium a margin for enabling recording of flag information or the like related to the standard, in consideration of future change or extension of the standard. Specifically, for example, xe2x80x9c0xe2x80x9d is recorded as flag information of one bit in initial standardization, and xe2x80x9c1xe2x80x9d is recorded over the flag information in changing the standard. A reproducing device corresponding to the changed standard checks whether the flag information is xe2x80x9c0xe2x80x9d or xe2x80x9c1xe2x80x9d. If the flag information is xe2x80x9c1xe2x80x9d, it reads and reproduces signals from the signal recording medium on the basis of the changed standard. If the flag information is xe2x80x9c0xe2x80x9d, and if the reproducing device corresponds also to the initially determined standard, it reads and reproduces signals from the signal recording medium on the basis of that standard. If the reproducing device does not correspond to the initial standard, reproduction of signals is not carried out.
However, once a reproducing device (hereinafter referred to as a former standard-adaptable reproducing device) capable of reproducing only signals recorded in conformity to a predetermined standard (hereinafter referred to as a xe2x80x9cformer standardxe2x80x9d or xe2x80x9cfirst coding methodxe2x80x9d) becomes popular, the user of the device will be confused because it cannot reproduce signals recorded on a recording medium in conformity to an upper level standard (hereinafter referred to as a xe2x80x9cnew standardxe2x80x9d or xe2x80x9csecond coding methodxe2x80x9d) using a more efficient coding system.
Particularly, some reproducing devices (former standard-adaptable reproducing devices), manufactured at the time when the former standard was determined, ignore flag information recorded on the recording medium and reproduce all the signals recorded on the recording medium as being coded in conformity to the former standard. Specifically, even when signals are recorded on the recording medium in conformity to the new standard, all the former standard-adaptable reproducing devices cannot identify the conformity to the new standard. Therefore, if the former standard-adaptable reproducing device carries out reproduction by interpreting a recording medium on which signals in conformity to the new standard are recorded as being a recording medium on which signals in conformity to the former standard are recorded, there is a possibility that the device cannot operate normally or that troublesome noise is generated.
To solve this problem, the present Assignee has proposed, in the specification and drawings of the Japanese Publication of Unexamined Patent Application No.Hei 10-22935, a method for preventing confusion to the user of the device and generation of noise, by recording signals indicating that xe2x80x9ca part of recorded signals cannot be reproduced by reproducing means adaptable only to this standardxe2x80x9d on the basis of the former standard in the case where recording is carried out in conformity to an additional standard, that is, the new standard, and by preventing reproduction of signals except for the signals recorded on the basis of the former standard in the case where signals are reproduced by the former standard-adaptable reproducing device. In addition, in the specification and drawings of the Japanese Publication of Unexamined Patent Application No.Hei 10-22935, there is proposed a method for enabling easy recording with an inexpensive recording device adaptable to the new standard, by pre-recording a message signal based on the former standard onto a recording medium and manipulating the contents of reproduction management information in the case where recording is carried out in conformity to the new standard so that the message signal is reproduced in the case where reproduction is carried out by the former standard-adaptable reproducing device. In the specification and drawing of the same publication, there is also proposed a method for notifying the user of the former standard-adaptable reproducing device of which tune is actually recorded in conformity to the former standard, by reproducing a message signal in accordance with a portion where recording is carried out in conformity to the new standard in the case where reproduction is carried out by the former standard adaptable reproducing device.
In these methods, however, the recorded sounds cannot be actually reproduced by the former standard-adaptable reproducing device. Therefore, the present Assignee has proposed, in the specification and drawings of the Japanese Publication of Unexamined Patent Application No.Hei 9-42514, a coding method for coding signals of multiple channels for each frame the size of which cannot be controlled by the encoder, in which a signal of a channel to be reproduced by the former standard adaptable reproducing device is coded with a smaller number of bits than the maximum number of bits that can be allocated in the frame so that a signal of another channel is coded in a free area in the frame thus provided, thereby enabling reproduction of signals of a small number of channels by the former standard-adaptable reproducing device, while the new standard-adaptable reproducing device is used to enable reproduction of signals of a greater number of channels. In this method, the coding method for the signals of the channel that are not reproduced by the former standard-adaptable reproducing device is made more higher in coding efficiency than the coding method of the former standard, thereby enabling reduction in deterioration of sound quality due to coding of multi-channel signals. In this method, on the assumption that an area that can be reproduced by the former standard-adaptable reproducing device is an area 1 while an area that is not reproduced by the former standard-adaptable reproducing device is an area 2, if a signal A=(L+R)/2 is recorded in the area 1 while a signal B=(Lxe2x88x92R)/2 is recorded in the area 2, the former standard-adaptable reproducing device can reproduce a monaural signal A and the new standard-adaptable reproducing device can reproduce stereo signals L, R from channels A, B.
The method for coding and recording the signals of (L+R)/2 and (Lxe2x88x92R)/2 and reproducing stereo signals is disclosed in, for example, James D. Johnston, xe2x80x9cPerceptual Transform Coding of Wide-band Stereo Signals,xe2x80x9d ICASSP 89, pp. 1993-1995.
However, in reproducing stereo signals by using these methods, quantization noise generated by coding may cause some trouble, depending on the type of the stereo signals.
FIGS. 1A to 1H show the states of quantization noise generated in the case where general stereo signals are coded, decoded, and reproduced in these methods.
FIGS. 1A and 1B show time base waveforms of a left channel component (L) of a stereo signal and a right channel component (R) of the stereo signal, respectively. FIGS. 1C and 1D show time base waveforms of signals obtained by converting the channels of L and R channel components to (L+R)/2 and (Lxe2x88x92R)/2, respectively. In FIGS. 1C and 1D, (L+R)/2 is expressed as A and (Lxe2x88x92R)/2 is expressed by B. In general, since there is strong correlation between the respective channels of the stereo signal, the signal level of B=(Lxe2x88x92R)/2 is significantly lower than that of the original signal L or R.
FIGS. 1E and 1F show the states of quantization noise generated in coding the signals of (L+R)/2=A and (Lxe2x88x92R)/2=B by the high efficiency coding method and then decoding the signals, respectively. In FIGS. 1E and 1F, N1 and N2 express time base waveforms of quantization noise components generated in coding the signals of (L+R)/2=A and (Lxe2x88x92R)/2=B, respectively. A signal obtained by coding and decoding (L+R)/2=A can be expressed as A+N1, and a signal obtained by coding and decoding (Lxe2x88x92R)/2=B can be expressed as B+N2. In the high efficiency coding method, the level of the quantization noise often depends on the original signal level. In this case, the signal level N2 is significantly lower than that of N1.
FIGS. 1G and 1H show the states where the respective channels of the stereo signal are separated from the signal waveforms of (A+N1) and (B+N2). By adding the signals of (A+N1) and (B+N2), the R channel component is eliminated and only the L component can be taken out. On the other hand, by subtracting the signal of (B+N2) from (A+N1), the L channel component is eliminated and only the R channel component can be taken out.
The quantization noise components N1 and N2 remain in the form of (N1+N2) or (N1xe2x88x92N2). However, since the level of N2 is significantly lower than that of N1, there is no particular problem in terms of auditory sense.
Meanwhile, FIGS. 2A to 2H similarly show the states of quantization noise with respect to a stereo signal such that the signal level of the right channel (R) is much lower than the signal level of the left channel (L). FIGS. 2A and 2B show time base waveforms of the left channel component (L) of the stereo signal and the right channel component (R) of the stereo signal, respectively. FIGS. 2C and 2D show time base waveforms of signals obtained by converting the channels of the L and R channel components to (L+R)/2 and (Lxe2x88x92R)/2, respectively. In FIGS. 2C and 2D, similar to FIGS. 1C and 1D, (L+R)/2 is expressed as A and (Lxe2x88x92R)/2 is expressed as B. In this example, the signal level of the R channel component is low and there is no correlation between the channels. Therefore, the signal level of B=(Lxe2x88x92R)/2 is not low and this signal becomes rather proximate to A=(L+R)/2.
FIGS. 2E and 2F, similar to FIGS. 1E and 1F, show the states of quantization noise generated in coding the signals of (L+R)/2=A and (Lxe2x88x92R)/2=B by the high efficiency coding method and then decoding the signals, respectively. In FIGS. 2E and 2F, N1 and N2 show time base waveforms of quantization noise components generated in coding the signals of (L+R)/2=A and (Lxe2x88x92R)/2=B, respectively. Similar to FIGS. 1E and 1F, a signal obtained by coding and decoding (L+R)/2=A can be expressed as A+N1, and a signal obtained by coding and decoding (Lxe2x88x92R)/2=B can be expressed as B+N2.
FIGS. 2G and 2H, similar to FIGS. 1G and 1H, show the states where the respective channels of the stereo signal are separated from the signal waveforms of (A+N1) and (B+N2). By adding the signals (A+N1) and (B+N2), the R channel component is eliminated and only the L component can be taken out. On the other hand, by subtracting the signal of (B+N2) from (A+N1), the L channel component is eliminated and only the R channel component can be taken out.
In this example of FIG. 2, too, the quantization noise components N1 and N2 remain in the form of (N1+N2) or (N1xe2x88x92N2). In this example, however, since the signal level of the R channel component is very low, the quantization noise component of (N1xe2x88x92N2) cannot be masked by the R channel component, and the quantization noise on the side of the R channel might be heard.
In view of the foregoing, it is an object of the present invention to provide a coding method and device, a decoding method and device, and a recording medium such that quantization noise generated by coding can be restrained to the minimum level to reduce deterioration in sound quality, in coding and decoding, for realizing multi-channel capability by a new standard extension while enabling reproduction by the former standard-adaptable reproducing device.
That is, in accordance with a coding/decoding method consistent with the present invention for realizing multi-channel capability by a new standard extension while enabling reproduction by a former standard-adaptable reproducing device, a channel signal of an extension part is optimally selected in accordance with an input signal and the quantization noise generated by coding is restrained to the minimum level so as to reduce deterioration in sound quality.
In accordance with methods consistent with the present invention, a coding method is provided. The coding method includes the steps of generating a first signal from signals of a plurality of input channels. Signal levels of a part of the plurality of input channels and the other channels are found. Either one of a second signal consisting only of a signal of the part of the channels or a second signal generated from signals of the plurality of input channels is selected based on the signal levels. The first signal and the selected second signal are coded.
In accordance with devices consistent with the present invention, a coding device is provided. The coding device includes a first signal generating means for generating a first signal from signals of a plurality of input channels. A second signal generating means is for selecting, on the basis of signal levels of a part of the plurality of input channels and the other channels, either one of a second signal consisting only of a signal of the part of the channels and a second signal generated from signals of the plurality of input channels. A coding means is for coding the first signal and the selected second signal.
In accordance with methods consistent with the present invention, a decoding method is provided. The decoding method includes separating from a code string a first coded signal, a second coded signal, and constituent information indicating a constituent state of a channel signal constituting the second coded signal. The separated first and second coded signals, respectively, are decoded to generate first and second signals. Restoration processing is selected for generating a plurality of channel signals from the first and second signals on the basis of the constituent information.
In accordance with methods consistent with the present invention, another decoding method is provided. The decoding method includes separating first and second coded signals from a code string including the first and second coded signals, the first coded signal being generated from signals of a plurality of channels and coded, the second coded signal being selected and coded from a second signal consisting only of a signal of a part of the plurality of channels and a second signal generated from signals of the plurality of channels on the basis of signal levels of the part of the channels and other channels. The separated first and second coded signals, respectively, are decoded. The signals of the plurality of channels are restored from the decoded first and second signals.
In accordance with devices consistent with the present invention, a decoding device is provided. The decoding device includes a separating means for separating from a code string a first coded signal, a second coded signal, and constituent information indicating a constituent state of a channel signal constituting the second coded signal. A decoding means is for decoding the separated first and second coded signals, respectively, to generate first and second signals. A control means is for selecting restoration processing for generating a plurality of channel signals from the first and second signals on the basis of the constituent information.
In accordance with devices consistent with the present invention, another decoding device is provided. The decoding device includes a separating means for separating first and second coded signals from a code string including the first and second coded signals, the first coded signal being generated from signals of a plurality of channels and coded, the second coded signal being selected and coded from a second signal consisting only of a signal of a part of the plurality of channels and a second signal generated from signals of the plurality of channels on the basis of signal levels of the part of the channels and the other channels. A decoding means is for decoding the separated first and second coded signals, respectively. A restoring means is for restoring the signals of the plurality of channels from the decoded first and second signals.