The present invention generally relates to a sound signal encoding method and apparatus, sound signal decoding method and apparatus, program, and a recording medium, and more particularly to a sound signal encoding method and apparatus for making high-efficiency coding of sound signals from a plurality of channels and transmitting the encoded sound signals or recording the signals to a recording medium, a recording medium having recorded therein a string of codes generated by the coding, a sound signal decoding method and apparatus for decoding the string of codes received or reproduced, a program for causing a computer to execute the sound signal coding or decoding process, and a computer-readable recording medium having the program recorded therein.
This application claims the priority of the Japanese Patent Application No. 2002-145267 filed on May 20, 2002, the entirety of which is incorporated by reference herein.
Conventionally, the unblocked frequency subband techniques represented by the subband coding or the like and the blocked frequency subband techniques represented by the transform coding or the like are known for making high-efficiency coding of audio signals such as sounds.
With the unblocked frequency subband techniques, a time-based audio is encoded by dividing it into a plurality of frequency subbands without blocking it. On the other hand, with the blocked frequency subband coding techniques, a time-based audio signal is divided into a plurality of frequency subbands by making frequency spectrum transform of the signal into a frequency-based signal, namely, coefficients obtained through the frequency spectrum transform of the audio signal are grouped by each of predetermined frequency subbands, and then the signal is encoded by the frequency subbands.
For an improved efficiency of coding, there has also been proposed a high-efficiency encoding technique being a combination of the unblocked frequency subband coding and blocked frequency subband coding. With this technique, a frequency band of a signal is divided by the subband coding into frequency subbands, for example, then the signal of each frequency subband is spectrally transformed into a frequency-based signal, and the signal is encoded by the spectrally transformed frequency subbands.
For dividing a frequency band, the quadrature mirror filter (QMF), for example, is used frequently since it can easily divide the frequency band with cancellation of aliasing. It should be noted that the frequency band division by the QMF is described in detail in the document “1976 R. E. Crochiere, Digital Coding of Speech in Subbands, Bell Syst. Tech. J. Vol. 55, No. 8, 1976” and the like.
The frequency subband techniques further include the polyphase quadrature filter (PQF), for example. This technique is to divide a frequency band into equal bandwidths. The PQF technique is detailed in the document “ICASSP 83 BOSTON, Polyphase Quadrature Filters—A new subband coding technique, Joseph H. Rothweiler” and the like.
On the other hand, the aforementioned frequency spectrum transform techniques includes a one by which an input audio signal is blocked into frames of a predetermined unit time, and a time-based signal is transformed into a frequency-based signal by subjecting each block to discrete Fourier transform (DFT), discrete cosine transform (DCT), modified discrete cosine transform (MDCT) or the like.
Note that the MDCT is described in detail in the document “ICASSP, 1987, Subband/Transform Coding Using Filter Bank Designs Based on Time Domain Aliasing Cancellation, J. P. Princen, A. B. Bradley, Univ. of Survey Royal Melbourne Inst. of Tech.” and the like.
By quantizing the signal of each frequency band, produced using the filter or spectrum transform as above, it is possible to control a frequency band caused by a quantization noise, whereby the signal can be encoded with an acoustically higher efficiency with the use of the masking effect of the noise. Also, the signal can be encoded with a much higher efficiency by normalizing signal components of each frequency subband with a largest absolute value of the signal components of the subband, for example.
The width of each frequency subband is determined with the human auditory sense, for example. Generally, an audio signal is divided into a plurality of frequency subbands (32 subbands, for example) called “critical band” of which the width is larger as the frequency is higher.
Also, to encode data of each frequency subband, a predetermined bit allocation or an adaptive bit allocation is made to the frequency subband. That is to say, to encode coefficient data obtained through the MDCT by a bit allocation, a number of bits are adaptively allocated to MDCT coefficient data of each frequency subband, obtained through the MDCT of each block of signal.
For configuration of an actual code string, first quantization accuracy information indicating a quantization step and a normalization coefficient indicating a coefficient used to normalize each signal component are encoded with a predetermined number of bits for each frequency subband to be normalized and quantized, and then the normalized and quantized spectrum signal is encoded.
For a further improvement of the compression ratio from a value, main information to directly be encoded, for example, it is necessary to improve the efficiency of encoding the spectrum signal as well as the efficiency of encoding sub-information which is not encoded directly such as the quantization accuracy information, normalization coefficient and the like.
On this account, the Inventors of the present invention have proposed, by the specification and drawings included in the Japanese patent application No. 2000-390589 already fined, a technique of improving the efficiency of encoding such sub-information with a variable-length coding using an inter-channel correlation between audio signals or a coding by controlling the range of existential distribution using the gradient coefficient.
Also, the Inventors of the present invention have proposed, by the specification and drawings included in the Japanese Patent Application No. 2001-182093, a technique of improving the efficiency of encoding gain information by the use of various kinds of correlation in a coding in which a gain control is made to suppress quantization noises called “pre-echo/post-echo”, caused by the quantization of the spectrum signal.
Furthermore, the Inventors of the present invention has proposed, by the specification and drawings included in the Japanese Patent Application Nos. 2000-380639 and 2001-182384, a technique of improving the efficiency of coding by a extracting tone component from a time-series signal and making spectrum transform coding of a residual error to prevent the efficiency of coding from being deteriorated by the tone component existent in a local frequency such as a sine wave, which was observed in the conventional coding techniques.
Note that the sine wave information indicating the extracted tone component, for example, waveform parameters such as frequency information, amplitude information, phase information, are encoded separately from the spectrum information, normalization information and quantization accuracy information of the residual error signal.
The ratio of compression can be increased by encoding the residual error signal with the technique disclosed in the specification and drawings included in the Inventors' Japanese patent application No. 2000-390589 or 2001-182093, for example the variable-length coding using an inter-channel correlation between audio signals or the coding by controlling the range of existential distribution using the gradient coefficient.
Different from the spectrum information, normalization information or quantum accuracy information of the residual error signal, however, the extracted tone component exists evenly in all the frequency bands, so that the coding efficiency will be worse in the variable-length coding using an inter-channel correlation between audio signals as the case may be.
The conventional variable-length coding using the inter-channel correlation between audio signals will be described in detail below. In the following description, it is assumed that the number of channels is two (2), namely, the audio signals are stereo signals, and the inter-channel correlation means a correlation between right and left channels. Also, although there will be described an example in which the correlation between the right and left channels is used for only amplitude information of the sine wave information indicating a tone component, the description is also true for phase information. Further, it is assumed that there have been extracted a number N.sub.L of sine waves on the left channel Lch and a number of N.sub.R sine waves on the right channel Rch.
FIG. 1 shows the general construction of a portion of a conventional sine wave information encoder which encodes sine wave information with the use of a correlation between the right and left channels, that encodes amplitude information on the right channel Rch. For the simplicity of illustration and explanation, however, it is assumed here that the number NL of sine waves on the left channel Lch is equal to the number NR of sine waves on the right channel Rch. As shown in FIG. 1, the sine wave information encoder, generally indicated with a reference number 200, includes a left-channel amplitude information holder 201, right-channel amplitude information holder 202, adder-subtracter 203, variable-length encoder 204 and a code string generator 205.
The left-channel amplitude information holder 201 indexes a number NL of sine waves extracted from the left channel Lch by 0 to NL−1, respectively, sequentially starting with the lowest-frequency one, and holds amplitude information in correspondence to the indexes. Similarly, the right-channel amplitude information holder 202 indexes a number NR of sine waves extracted from the right channel Rch by 0 to NR−1, respectively, sequentially starting with the lowest-frequency one, and holds amplitude information in correspondence to the indexes. Then, the left- and right-channel amplitude information holders 201 and 202 supply the amplitude information held therein to the adder-subtracter 203.
The adder-subtracter 203 calculates a difference by subtracting the i-th amplitude information on the left channel Lch from the i-th amplitude information on the right channel Rch, and supplies the difference thus calculated to the variable-length encoder 204.
The variable-length encoder 204 makes variable-length coding of the difference supplied from the adder-subtracter 203 according to a variable-length code table to provide a variable-length code, and supplies the variable-length code as a sine wave information code to the code string generator 205.
The code string generator 205 generates a code string according to the sine wave information code supplied from the variable-length encoder 204.
When supplied with sine wave information as shown in FIG. 2, the sine wave information encoder 1 works as will be described below. As will be known, many of the information on the right channel are similar in value to corresponding ones on the left channel, and so the correlation between the right and left channels can be utilized to encode the information with an improved efficiency. In encoding amplitude information (3 bits when not compressed), the difference resulted from subtraction of amplitude information on the left channel Lch from one on the right channel Rch, corresponding in index (n) to the amplitude information on the left channel Lch, will be as shown in FIG. 3. Since the difference distribution is not even, the number of bits encoded can be reduced by making variable-length coding according to a variable-length code table as shown in FIG. 4 for example. More specifically, the amplitude information on the right channel Rch can be encoded with a total of 5 bits. Namely, the phase information (of 12 bits (=3 bits×4) when not compressed) can be compressed by 7 bits.
Similarly, in encoding phase information (of 3 bits when not compressed), the difference resulted from subtraction of phase information on the left channel from that on the right channel Rch, corresponding in index (n) to the amplitude information on the left channel Lch, will be as shown in FIG. 5. By making variable-length coding of the difference according to the variable-length code table shown in FIG. 4, the phase information on the right channel Rch can be encoded with a total of 5 bits. This number of bits is 7 bits smaller than 12 bits (=3 bits×4) when the phase information is not compressed.
When supplied with sine wave information as shown in FIG. 6, the sine wave information encoder 1 works as will be described below. As will be known, many of information on the right channel are similar in value to corresponding ones on the left channel. Since a difference is calculated between the amplitude information on the right channel Rch and that on the left channel Lch, corresponding in index (n) to the amplitude information on the right channel Rch, the difference is a total of 14 bits as shown in FIG. 7. The amplitude information is of 12 bits when not compressed. Similarly, the difference in phase information between the right and left channels Rch and Lch is a total of 24 bits as shown in FIG. 8, which means a lower efficiency of coding than when the phase information is not compressed.