The present application claims priority to Japanese Application(s) No(s). P2000-380642 filed Dec. 14, 2000, which application(s) is/are incorporated herein by reference to the extent permitted by law.
1. Field of the Invention
The present invention relates to a coding device and method, a decoding device and method, and a recording medium therefor. More particularly, the present invention relates to a coding device and method and a decoding device and method, which are capable of coding or decoding an audio signal at a low bit rate, and a recording medium therefor.
2. Description of the Related Art
In recent years, a so-called xe2x80x9cperception audio coder (decoder)xe2x80x9d has been developed. In a conventional CD-ROM (Compact Disk-Read Only Memory), transmission and storage of high-quality audio signals are possible at a bit rate which is approximately one twelfth the bit rate in common use.
Such a coder codes an audio signal by using a waveform portion, which is contained in the audio signal, which cannot be listened to due to the limitation of the auditory system of human beings. With regard to a stereo audio signal, for example, a coder using MS stereo coding (intermediate-portion/side-portion stereo coding) and a coder using IS stereo coding (intensity stereo coding) are known.
FIG. 1 is a block diagram showing an example of the construction of a conventional audio signal transmission system using MS stereo coding.
A left signal L and a right signal R which form a stereo audio signal is input to a computation section 1. These signals are added by an adder 1-1, and the resulting signal is output to a multiplier 1-2. Meanwhile, a difference signal of those signals is generated in a subtracter 1-3, and the resulting signal is output to a multiplier 1-4. In the multipliers 1-2 and 1-4, the outputs of the adder 1-1 and the subtracter 1-3 are multiplied by a coefficient x, and a sum signal M and a difference signal S are generated. These signals are coded by a coding section 2, and are output to recording media or a transmission line 3 formed of a network, etc.
A decoding section 4 performs a decoding process on an input code sequence in order to generate a sum signal Mxe2x80x2 and a difference signal Sxe2x80x2. The sum signal Mxe2x80x2 and the difference signal Sxe2x80x2 are added by an adder 5-1, and are multiplied by a coefficient y in a multiplier 5-2, and the resulting signal is output as a left signal Lxe2x80x2. Also, the sum signal Mxe2x80x2 and the difference signal Sxe2x80x2 are subtracted by a subtracter 5-3, and the resulting signal is multiplied by a coefficient y in a multiplier 5-4 and is output as a right signal Rxe2x80x2. For example, the coefficient x is set to 0.5, and the coefficient y is set to 1.0.
A sum signal exerts more influence on the sense of hearing of a human being than a difference signal. In the manner described above, by generating a sum signal M and a difference signal S and by assigning a larger amount of data (the number of bits) to the sum signal M, coding can be performed with higher efficiency than when the signals are coded (dual decoding) individually. MS stereo coding is effective for signals of lower frequency bands.
FIG. 2 is a block diagram showing an example of the construction of a conventional audio signal transmission system using IS stereo coding.
The left signal L and the right signal R which are input to a computation section 11, are added by an adder 11-1, and an intensity signal I determined by a correlation of those signals is generated. Also, a left power signal P1 (a scaling signal in which the energy content is described) indicating the power of the left signal L and a right power signal Pr (a scaling signal in which the contents of energy are described) indicating the power of the right signal R are generated in the computation section 11. The intensity signal I, the left power signal Pl, and the right power signal Pr are input to a coding section 12, where the signals are coded, and thereafter, the signals are output to a transmission line 13.
A decoding section 14 decodes the input signals, and outputs the obtained intensity signal Ixe2x80x2, left power signal Plxe2x80x2, and right power signal Prxe2x80x2 to a computation section 15. In the computation section 15, a multiplier 15-1 regenerates a left signal Lxe2x80x2 in accordance with the intensity signal Ixe2x80x2 and the left power signal Plxe2x80x2 and outputs them externally, and a multiplier 15-2 regenerates a right signal Rxe2x80x2 in accordance with the intensity signal Ixe2x80x2 and the right power signal Prxe2x80x2 and outputs them externally.
As a result of performing coding by using IS stereo coding, the characteristics such that the position detection performance based on the time difference of the hearing of a human being is lower for a signal in higher-frequency domains can be used. For example, coding can be performed at a data rate approximately one half that in a case where left and right signals are coded independently.
For MS stereo coding and IS stereo coding, equivalent advantages are not obtained with respect to all the input signals. For example, MS stereo coding is an effective means only for the case where the energy of the difference signal S becomes smaller than the energy of the sum signal M. Otherwise, when the left signal Lxe2x80x2 and the right signal Rxe2x80x2 are regenerated from the sum signal Mxe2x80x2 and the difference signal Sxe2x80x2, quantization noise which occurs due to coding or decoding (quantization/inverse quantization) causes interference, and noise which can be heard clearly in the sense of hearing may be produced.
Furthermore, in IS coding, when the high-frequency components of a stereo signal are synthesized, and there is not a high correlation between a spectrum SPm which is obtained by converting the components from the time domain to the frequency domain and the envelope shapes of the original power spectra Pl and Pr, for example, when the left signal L is a signal of a trumpet and the right signal R is a signal of cymbals, the positional relationship between the respective sound sources (musical instruments) cannot be maintained, and noise which can be heard clearly may occur in the sense of hearing.
Therefore, a coding device has been conceived in which, as shown in FIGS. 3, 4, and 5, dual coding in which left and right signals are each coded independently, and MS or IS stereo coding are combined, and a coding method is selected as appropriate in accordance with an input signal.
FIG. 3 is a block diagram showing an example of the construction of a prior coding device for coding an input signal in the time domain.
A filter bank 31-1 divides an input left signal L(t) into signals Ln(t), Lnxe2x88x921(t), . . . , L1(t) (n is the number of divided bands) of predetermined frequency bands, and outputs each signal to a corresponding dual coding section 32 and a corresponding MS/IS coding section 33. In FIG. 3, although only the dual coding section 32 and the MS/IS coding section 33 for processing the signal Ln(t) are shown, coding sections corresponding to signals Lnxe2x88x921(t), Lnxe2x88x922(t), . . . , L1(t) are provided in a similar manner.
Similarly to the filter bank 31-1, a filter bank 31-2 also divides a right signal Rn(t) into signals Rn(t), Rnxe2x88x921(t), . . . , R1(t) of predetermined frequency bands, and outputs each signal to the corresponding dual coding section 32 and the corresponding MS/IS coding section 33. In the following, when the filter bank 31-1 and the filter bank 31-2 need not be identified individually, these are referred to collectively as a filter bank 31. The same applies to the other devices.
The dual coding section 32 codes an input signal by a dual coding method (the left signal Ln(t) and the right signal Rn(t) are each coded independently), and outputs the obtained data to a switch 35. Furthermore, the dual coding section 32 creates number-of-necessary-bits information Bn(t)1 which is information about the amount of coded data and distortion factor information En(t)1 which is information about the distortion factor with respect to a sine wave when coding is performed, and supplies them to a coding control section 34.
The MS/IS coding section 33 codes the input signal by the MS stereo coding method or the IS stereo coding method, and outputs the obtained data to the switch 35. Also, the MS/IS coding section 33 creates number-of-necessary-bits information Bn(t)2 and distortion factor information En(t)2, and supplies them to the coding control section 34.
The coding control section 34 switches the contact of the switch 35 so that a code sequence which is coded by a coding method with a small distortion factor or a coding method with a smaller number of necessary bits is selected on the basis of the information supplied from the dual coding section 32 and the MS/IS coding section 33. The code sequence selected by the switch 35 is input to a multiplexer 36.
The multiplexer 36 combines the code sequences Cn, Cnxe2x88x921, . . . , C1 of each band, divided by the filter bank 31, and outputs the combined code sequence C to a device, such as a transmission line (not shown), external of a coding device 21.
FIG. 4 is a block diagram showing an example of the construction of a prior coding device for coding an input signal.
A domain conversion section 51-1 spectrum-converts the input left signal L(t) into the frequency domain, and outputs the generated spectrum signal Ln(f) to a dual coding section 52 and an MS/IS coding section 53. Similarly to the domain conversion section 51-1, a domain conversion section 51-2 also spectrum-converts the input right signal R(t) into the frequency domain, and outputs the generated spectrum signal Rn(f) to the dual coding section 52 and the MS/IS coding section 53.
The dual coding section 52 codes the input signal by the dual coding method, and outputs the obtained code sequence to a switch 55. Furthermore, the dual coding section 52 creates number-of-necessary-bits information Bn(f)1 which is information about the amount of coded data and distortion factor information En(f)1 which is information about the distortion factor with respect to a sine wave when coding is performed, and supplies them to a coding control section 54.
The MS/IS coding section 53 codes the input signal by an MS stereo coding method or an IS stereo coding method, and outputs the obtained data to the switch 55. Furthermore, the MS/IS coding section 53 creates number-of-necessary-bits information Bn(f)2 and distortion factor information En(f)2, and supplies them to the coding control section 54.
The coding control section 54 controls the switch 55 so that a code sequence which is coded by a coding method with a smaller distortion factor or a coding method with a smaller number of necessary bits is selected on the basis of the information supplied from the dual coding section 52 and the MS/IS coding section 53.
FIG. 5 is a block diagram showing an example of the construction of a prior coding device in which the coding device 21 of FIG. 3 and the coding device 41 of FIG. 4 are combined.
More specifically, in this example, the left signal L(t) and the right signal R(t) are divided into a predetermined number of bands by filter banks 71-2 and 71-2, and the divided signals are spectrum-converted by domain conversion sections 72-1 and 72-2, respectively. The converted spectrum signals are coded by a dual coding section 73 and an MS/IS coding section 74. In a coding control section 75 and a switch 76, among the code sequences coded in the dual coding section 73 and the MS/IS coding section 74, the code sequence by the coding method with higher efficiency (with a smaller distortion factor or with a smaller amount of data) is selected and is output to a multiplexer 77. Then, after the input data of all the bands is combined by the multiplexer 77, the data is output to outside a coding device 61.
Next, referring to the flowchart in FIG. 6, the process of the coding control section 34 of the coding device 21 of FIG. 3 will be described below. Although descriptions are omitted, the processes of the coding control section 54 of FIG. 4 and the coding control section 75 of FIG. 5 are the same as the above. In this example, it is assumed that the coding control section 34 selects a coding method on the basis of the distortion factor.
In step S1, the coding control section 34 compares the distortion factor information En(t)1 supplied from the dual coding section 32 with the distortion factor information En(t)2 supplied from the MS/IS coding section 33. Then, the coding control section 34 determines whether or not the distortion factor supplied from the dual coding section 32 is smaller than the distortion factor supplied from the MS/IS coding section 33. When it is determined that the distortion factor is smaller, in step S3, the coding control section 34 controls the switch 35 so that the data coded by the dual coding section 32 is output to the multiplexer 36.
When, on the other hand, it is determined in step S2 that the distortion factor supplied from the dual coding section 32 is greater than the distortion factor supplied from the MS/IS coding section 33, the process proceeds to step S4, where the coding control section 34 controls the switch 35 so that the data coded by the MS/IS coding section 33 is output to the multiplexer 36.
The same process is performed in the other bands. As a result, a code sequence C which is coded for each band by a low-bit-rate coding method is created, and is output to outside the coding device 21.
In the manner described above, the coding efficiencies of the respective coding methods are compared with each other, and an optimum method is selected according to the result thereof, thereby making it possible to obtain coded data at a lower bit rate in comparison with a case in which coding is performed by a single coding method.
FIGS. 7A, 7B, 7C, and 7D show an example of the relationship among the operation time probability PMS of MS stereo coding or the operation time probability PIS of IS stereo coding in the coding devices of FIGS. 3 to 5, the signal power to noise power ratio SNR of the coded (quantized) signal, and the separation of the left and right signals.
As shown in FIG. 7A, the probability PMS or PIS shown in the horizontal axis is proportional to the SNR shown in the vertical axis. The nearer the probability PMS or PIS approaches 100% (monaural), the more the SNR is improved.
FIG. 7B shows the change in the probability PMS or PIS with respect to time. FIG. 7C shows the change in the SNR with respect to time. As shown in these figures, since the waveforms thereof become in same phase, and the coding efficiency is improved by increasing the probability PMS or PIS in accordance with the input signal, the SNR is also improved, and thus the sound quality is improved. For this reason, it is preferable from the viewpoint of coding efficiency that the probability PMS or PIS be higher.
However, high probability PMS indicates that there is a high correlation between the left and right signals. High probability PIS indicates that the intensity signal and the spectrum to be coded are for one channel although the power levels are different. That is, high probability PMS or PIS is indicates that a stereo signal is changed into a monaural signal. As shown in FIG. 7D, the separation of the left and right signals becomes poorer as the probability PMS/PIS is increased.
Furthermore, since the probability PMS or PIS is linked with the SNR, if the value of the probability PMS or PIS is high, there is the risk that, due to a change of the properties of the input signal or due to a change of the input signal with respect to time, the SNR falls below the perceptible noise level limit in an auditory psychological model (a level at which, if the SNR decreases to less than that level, perceptual noise is heard). Therefore, when considered together, the value of the probability PMS or PIS being high is not always preferable.
In the coding devices shown in FIGS. 3 to 5, a determination of whether the efficiency when coding is performed by MS stereo coding or IS stereo coding or the efficiency when coding is performed by dual coding is superior, cannot be known until the two coding processes are actually performed, thus presenting the problem that the amount of processing in each coding section increases.
Also, when MS stereo coding or IS stereo coding is performed, the coding efficiency can be increased (quantized noise can be decreased). However, when it is not performed, such advantages cannot be obtained. Consequently, sound-quality variations with respect to time are large between when MS stereo coding or IS stereo coding is performed or not, and a problem arises in that the listener feels a substantial sense of incongruity in the sense of hearing.
The present invention is made in view of such circumstances. The present invention aims to code or decode an audio signal at a higher efficiency while the listener is prevented from feeling a sense of incongruity.
To this end, according to one aspect of the present invention, there is provided a coding device for coding an input signal, comprising: coding method selection means for selecting a coding method in accordance with the input signal; coding means for coding the input signal in accordance with the coding method selected by the coding method selection means; distortion factor detection means for detecting a distortion factor of coding by the coding means; and mixing means for mixing the left and right components of the input signal on the basis of a mixing ratio determined in such a manner as to correspond to the distortion factor detected by the distortion factor detection means, wherein the coding method selection means selects the coding method in accordance with the input signal mixed by the mixing means.
The coding device may further comprise output correction information creation means for creating output correction information which is used when the input signal coded by the coding means is decoded.
The coding method selection means may select the coding method for the input signal on the basis of a threshold value determined according to the construction of the coding device.
The coding method selection means may select the coding method from among a dual coding method, an MS stereo coding method, and an IS stereo coding method.
The coding method selection means may select the dual coding method to perform coding on the basis of the correlation between the left and right components of the input signal, that is, the total of the sum signals with respect to the total of the difference signals of the left and right components, and may select MS stereo coding or IS stereo coding to perform coding on the basis of the maximum value of the absolute value of the difference of the left and right components of the input signal.
The mixing means may store the mixing ratio, and may change the mixing ratio on the basis of an interpolation function of the mixing ratio determined immediately before and the mixing ratio determined currently.
The coding device may further comprise input signal storage means for storing the input signal, wherein the mixing means may mix again the left and right components of the same input signal on the basis of the distortion factor used when the input signal is coded.
According to another aspect of the present invention, there is provided a coding method for coding an input signal, comprising: a coding method selection step of selecting a coding method in accordance with the input signal; a coding step of coding the input signal in accordance with the coding method selected in the coding method selection step; a distortion factor detection step of detecting a distortion factor of coding in the coding step; and a mixing step of mixing the left and right components of the input signal on the basis of a mixing ratio determined in such a manner as to correspond to the distortion factor detected in the distortion factor detection step, wherein the process of the coding method selection step selects the coding method in accordance with the input signal mixed in the mixing step.
According to another aspect of the present invention, there is provided a recording medium having recorded thereon a computer-readable program, the program comprising: a coding method selection step of selecting a coding method in accordance with an input signal; a coding step of coding the input signal in accordance with the coding method selected in the coding method selection step; a distortion factor detection step of detecting a distortion factor of coding in the coding step; and a mixing step of mixing the left and right components of the input signal on the basis of a mixing ratio determined in such a manner as to correspond to the distortion factor detected in the distortion factor detection step, wherein the process of the coding method selection step selects the coding method in accordance with the input signal mixed in the mixing step.
According to another aspect of the present invention, there is provided a decoding device for decoding a code sequence coded by a predetermined coding method, the decoding device comprising: decoding method selection means for selecting a decoding method corresponding to the coding method; decoding means for decoding an input code sequence in accordance with the decoding method selected by the decoding method selection means; correction means for correcting the left and right components of a signal decoded by the decoding means on the basis of information supplied from the coding device; and output means for outputting the signal corrected by the correction means.
According to another aspect of the present invention, there is provided a decoding method for decoding a code sequence coded by a predetermined coding method, the decoding method comprising: a decoding method selection step of selecting a decoding method corresponding to a coding method used by a coding device; a decoding step of decoding an input code sequence in accordance with the decoding method selected in the decoding method selection step; a correction step of correcting the left and right components of a signal decoded in the decoding step on the basis of information supplied from the coding device; and an output step of outputting the signal corrected in the correction step.
According to another aspect of the present invention, there is provided a recording medium having recorded thereon a computer-readable program, the program comprising: a decoding method selection step of selecting a decoding method corresponding to a coding method used by a coding device; a decoding step of decoding an input code sequence in accordance with the decoding method selected in the decoding method selection step; a correction step of correcting the left and right components of a signal decoded in the decoding step on the basis of information supplied from the coding device; and an output step of outputting the signal corrected in the correction step.
In the coding device and method and the program of the recording medium of the present invention, a coding method is selected in accordance with an input signal, the input signal is coded on the basis of the selected coding method, and the left and right components of the input signals are mixed. Furthermore, a coding method is selected in accordance with the mixed input signals. Therefore, it is possible to code an audio signal with higher efficiency.
In the decoding device and method and the program of the recording medium of the present invention, a decoding method corresponding to a coding method used by a coding device is selected, and an input code sequence is decoded on the basis of the selected decoding method. Furthermore, the left and right components of the decoded signal are corrected on the basis of the information supplied from the coding device, and the corrected signal is output. Therefore, it is possible to reproduce a coded audio signal with higher efficiency while the listener is prevented from feeling a sense of incongruity.
Further objects, features and advantages of the present invention will become apparent from the following description of the preferred embodiments with reference to the attached drawings.