The present invention relates to methods and apparatus for encoding an audio signal into a digital code with high efficiency and for decoding the digital code into the audio signal, which can be employed for recording and reproduction of audio signals and their transmission and broadcasting over a communication channel.
A conventional high-efficiency audio-coding scheme is such a transform coding method as depicted in FIG. 1. With this method, an audio signal input as a sequence of signal samples is transformed into frequency-domain coefficients in a time-frequency transformation part 11 upon each input of a fixed number of samples and then encoded and the encoded frequency-domain coefficients are preprocessed in a preprocessing part 2 and quantized in a quantization part 3. A typical example of this scheme is TWINVQ (Transform-domain Weighted Interleave Vector Quantization).
The TWINVQ scheme uses weighted interleave vector quantization at the final stage of the quantization part 3. The vector quantization features two-stage flattening of coefficients in the preprocessing part 2 since the quantization efficiency increases as the distribution of input coefficient values becomes more even. In the first stage, the frequency-domain coefficients are normalized by the LPC spectrum to thereby roughly flatten their total variations. In the second stage, frequency-domain coefficients are further normalized for each of subbands having the same bandwidth on the Bark scale, by which they are flattened more finely than in the first stage. The Bark scale is a kind of frequency scale.
The Bark scale has a feature that frequencies at equally spaced points provide pitches of sound nearly equally spaced apart in terms of the human auditory sense. The subbands of the same bandwidth on the Bark scale are approximately equal in width perceptually, but on a linear scale their bandwidth increases with an increase in frequency as shown in FIG. 2. Accordingly, when the frequency-domain coefficients are split into subbands having similar bandwidth on the Bark scale, the higher the frequency of the subband, the more it contains coefficients.
The second-stage flattening on the Bark scale is intended to effectively allocate a limited amount of information, taking the human auditory sense into account. The flattening operation by normalization for each subband on the Bark scale is based on the expectation that the coefficients in the subbands are steady, but since the subbands at higher frequencies contain more coefficients, the situation occasionally arises where the coefficients are not steady in the subbands as depicted in FIG. 2. This incurs impairment of the efficiency of vector quantization, leading to the degradation of sound quality of decoded audio signals. Such a problem is likely to occur especially when the input audio signal contains a lot of tone components in the high-frequency range.
By the way, the TWINVQ scheme is described in detail in N. Iwakami, et al., xe2x80x9cTransformed Domain Interleave Vector Quantization (TwinVQ),xe2x80x9d preprint of the 101st Audio Engineering Society Convention, 4377, (1996).
In the audio-coding of FIG. 1, the quantization may also be scalar quantization using adaptive bit allocation. Such a coding method splits the frequency-domain coefficients into subbands and conducts optimum bit allocation for each subband. The subbands may sometimes be divided so that they have the same bandwidth on the Bark scale with a view to achieving a better match to the human auditory sense. In this instance, however, the coefficients in the subbands at the higher frequencies are often unsteady as is the case with the TWINVQ scheme, leading to impairment of the quantization efficiency.
As a solution to such a problem, there is proposed in Japanese Patent Application Laid-Open Gazette No. 7-336232 a coding method that transforms the input signal to a frequency-domain signal and adaptively changes with the shape of the spectral envelope the bandwidth of each subband in which the frequency-domain coefficients are flattened (normalized). This method makes narrow the bandwidths of subbands containing tone components and wide the bandwidths of other subbands, thereby reducing the number of subbands and hence increasing the coding efficiency accordingly. With this method, however, when tone components are sparse, narrow bandwidths are applied to flat portions near the tone components, sometimes impairing the coding efficiency. Further, normalization information needs to be encoded and sent for each component; therefore, if many tone components are scattered, the amount of normalization information to be encoded increases accordingly.
With a view to increasing the coding efficiency, there is proposed in Japanese Patent Application Laid-Open Gazette No. 7-168593 a scheme of encoding the tone component and others separately of each other. With this scheme, since the spectrum of each maximal value and adjoining spectra are normalized and encoded as a tone component signal of one group, information about the position of the spectrum o the maximal value and the group size needs to be encoded and sent. On this account, when many tone components are present, it is necessary to encode many pieces of information about the positions of the spectra of maximal values and the group sizesxe2x80x94this is likely to constitute an obstacle to increasing the coding efficiency.
Japanese Patent Application Laid-Open Gazette No. 7-248145 describes a scheme which separates pitch components formed by equally spaced tone components and encoding them individually. The position information of the pitch components is given by the fundamental frequency of the pitch, and hence the amount of information involved is small; however, in the case of a metallic sound or the like of a non-integral harmonic structure, the tone components cannot accurately be separated.
It is an object of the present invention to provide a coding method which permits highly efficient transform coding of the input audio signal having many tone components in the high-frequency range, a decoding method for such a coded signal, apparatus using the coding and decoding methods, and recording media having recorded thereon the methods as computer-executable programs.
According to an aspect of the present invention, there is provided an audio signal coding method for coding input audio signal samples, the method comprising the steps of:
(a) time-frequency transforming every fixed number of input audio signal samples into frequency-domain coefficients;
(b) dividing said frequency-domain coefficients into coefficient segments each consisting of one or more coefficients to generate a sequence of coefficient segments;
(c) calculating the intensity of each coefficient segment in said sequence of coefficient segments;
(d) classifying the sequence of coefficient segments into either one of at least two groups according to the intensities of said coefficient segments to generate at least two sequences of coefficient segments, and encoding and outputting classification information as a classification information code; and
(e) encoding said at least two sequences of coefficient segments and outputting them as coefficient codes.
According to another aspect of the present invention, there is provided a decoding method for decoding input digital codes into audio signal samples and outputting them, the method comprising the steps of:
(a) decoding said input digital codes into plural sequences of coefficient segments;
(b) decoding said input digital codes to obtain classification information of coefficient segments, combining said plural sequences of coefficient segments based on said classification information to reconstruct original frequency-domain coefficients formed by a single contiguous sequence of coefficient segments; and
(c) transforming said frequency-domain coefficients into the time domain and outputting the resulting audio signal samples as an audio signal.
According to another aspect of the present invention, there is provided a decoding method comprises the steps of:
(a) decoding said input digital codes into coefficient segments each consisting of plural frequency-domain coefficients;
(b) decoding said input digital codes to obtain classification information of said coefficient segments and classifying said coefficient segments into plural sequences of coefficient segments based on said classification information;
(c) decoding said input digital codes to obtain normalization information of said coefficient segments and inverse-normalizing plural sequences of coefficient segments based on said normalization information;
(d) rearranging said inverse-normalized plural sequences of coefficient segments into the original single sequence to reconstruct original frequency-domain coefficients: and
(e) transforming said frequency-domain coefficients into the time domain and outputting the resulting audio signal samples as an audio signal.
According to another aspect of the present invention, there is provided a coding apparatus which encodes input audio signal samples into output digital codes, the apparatus comprising:
a time-frequency transformation part for time-frequency transforming every fixed number of input audio signal samples into frequency-domain coefficients;
a coefficient segment generating part for dividing said frequency-domain coefficients from said time-frequency transformation part into segments each consisting of a contiguous sequence of coefficients;
a segmental intensity calculating part for calculating the intensity of each coefficient segment from said coefficient segment generating part;
a coefficient segment classifying part for dividing said coefficient segments into at least two groups according to the relative magnitude of said segmental intensity calculated in said segmental intensity calculating part, then classifying said segments generated in said coefficient segment generating part into at least two sequences based on information about said grouping, and encoding and outputting classification information as a digital code; and
a quantization part for encoding each of said coefficients classified into said at least two sequences and outputting said encoded coefficients as said digital codes.
According to another aspect of the present invention, there is provided a coding apparatus which comprises:
a time-frequency transformation part for time-frequency transforming every fixed number of input audio signal samples into frequency-domain coefficients;
a coefficient segment generating part for dividing said frequency-domain coefficients from said time-frequency transformation part into segments each consisting of a contiguous sequence of coefficients;
a segmental intensity calculating part for calculating the intensity of each coefficient segment from said coefficient segment generating part;
a coefficient segment classifying part for dividing said coefficient segments into at least two groups according to the relative magnitude of said segmental intensity calculated in said segmental intensity calculating part, then classifying said segments generated in said coefficient segment generating part into at least two sequences based on information about said grouping, and encoding and outputting classification information as a digital code;
a flattening part for normalizing the intensity of each of said coefficient segments classified into at least two sequences in said coefficient segment classifying part, coding normalization information, and outputting said coded information as a digital code;
a coefficient combining part for recombining said at least two sequences of intensity-normalized coefficient segments into the original single sequence of coefficient segments through utilization of said grouping information; and
a quantization part for quantizing said recombined coefficient segments and outputting the quantized values as said digital codes.
According to another aspect of the present invention, there is provided a decoding apparatus which decodes input digital codes into audio signal samples, the apparatus comprising:
an inverse-quantization part for decoding said input digital codes into plural sequences of coefficient segments;
a coefficient combining part for decoding said input digital codes to obtain classification information of said coefficient segments, and combining said plural sequences of coefficient segments based on said classification information to reconstruct a single sequence of frequency-domain coefficients sequentially arranged; and
a frequency-time transformation part for frequency-time transforming the reconstructed frequency-domain coefficients into the time domain and outputting the resulting audio signal samples as an audio signal.
According to still another aspect of the present invention, there is provided a decoding apparatus which comprises:
an inverse-quantization part for decoding said input digital codes into coefficient segments;
a coefficient segment classifying part for decoding said input digital codes to obtain classification information of said coefficient segments, and classifying said coefficient segments into plural sequences based on said classification information;
an inverse-flattening part for decoding said input digital codes to obtain normalization information of said coefficient segments classified into said plural sequences, and inverse-normalizing said plural sequences of coefficient segments based on said the normalization information;
a coefficient combining part for combining said inverse-normalized plural sequences of coefficient segments into a single sequence of coefficient segments sequentially arranged based on said classification information to reconstruct said frequency-domain coefficients; and
a frequency-time transformation part for frequency-time transforming said frequency-domain coefficient into the time domain and outputting the resulting audio signal samples as an audio signal.