As a method for encoding a sound signal with a low-bit rate (for example, about 10 to 20 kbit/s), adaptive coding for an orthogonal transform coefficient in a frequency domain, such as DFT (Discrete Fourier Transform) and MDCT (Modified Discrete Cosine Transform), is known. For example, MEPG USAC (Unified Speech and Audio Coding), which is a standard technique, has a TCX (transform coded excitation) encoding mode, and, in this mode, MDCT coefficients are normalized for each frame and variable-length encoded after being quantized (see, for example, Non-Patent Literature 1).
FIG. 1 shows a configuration example of a conventional TCX-based encoding apparatus. The encoding apparatus in FIG. 1 is provided with a frequency domain transforming portion 11, a linear prediction analyzing portion 12, an amplitude spectral envelope sequence generating portion 13, an envelope normalizing portion 14 and an encoding portion 15. Each portion in FIG. 1 will be described below.
<Frequency Domain Transforming Portion 11>
A time domain sound signal is inputted to the frequency domain transforming portion 11. The sound signal is, for example, a voice signal or an acoustic signal.
The frequency domain transforming portion 11 transforms the inputted time domain sound signal to an MDCT coefficient sequence X(0), X(1), . . . , X(N−1) at a point N in a frequency domain for each frame with a predetermined time length. Here, N is a positive integer.
The transformed MDCT coefficient sequence X(0), X(1), . . . , X(N−1) is outputted to the envelope normalizing portion 14.
<Linear Prediction Analyzing Portion 12>
A time domain sound signal is inputted to the linear prediction analyzing portion 12.
The linear prediction analyzing portion 12 generates linear prediction coefficients α1, α2, . . . , αp by performing linear prediction analysis for a sound signal inputted in frames. Further, the linear prediction analyzing portion 12 encodes the generated linear prediction coefficients α1, α2, . . . , αp to generate linear prediction coefficient codes. An example of the linear prediction coefficient code is LSP codes, which are codes corresponding to a sequence of quantized values of an LSP (Line Spectrum Pairs) parameter sequence corresponding to the linear prediction coefficients α1, α2, . . . , αp. Here, p is a positive integer equal to or larger than 2.
Further, the linear prediction analyzing portion 12 generates quantized linear prediction coefficients ^α1, ^α2, . . . , ^αp which are linear prediction coefficients corresponding to the generated linear prediction coefficient codes.
The generated quantized linear prediction coefficients ^α1, ^α2, . . . , ^αp are outputted to the amplitude spectral envelope sequence generating portion 13. Further, the generated linear prediction coefficient codes are outputted to a decoding apparatus.
For the linear prediction analysis, for example, a method is used in which linear prediction coefficients are obtained by determining autocorrelation for the sound signal inputted in frames and performing a Levinson-Durbin algorithm using the determined autocorrelation. Otherwise, a method may be used in which linear prediction coefficients are obtained by inputting an MDCT coefficient sequence determined by the frequency domain transforming portion 11 to the linear prediction analyzing portion 12 and performing the Levinson-Durbin algorithm for what is obtained by performing inverse Fourier transform of a sequence of square values of coefficients of the MDCT coefficient sequence.
<Amplitude Spectral Envelope Sequence Generating Portion 13>
The quantized linear prediction coefficients ^α1, ^α2, . . . , ^αp generated by the linear prediction analyzing portion 12 are inputted to the amplitude spectral envelope sequence generating portion 13.
The amplitude spectral envelope sequence generating portion 13 generates a smoothed amplitude spectral envelope sequence ^Wγ(0), ^Wγ(1), . . . , ^Wγ(N−1) defined by the following formula (1) using the quantized linear prediction coefficients ^α1, ^α2, . . . , ^αp. In the formula (1), exp(●) indicates an exponential function with a Napier's constant as a base on the assumption that “●” is a real number, and j indicates an imaginary unit. Further, γ is a positive constant equal to or smaller than 1 and is a coefficient which reduces amplitude unevenness of an amplitude spectral envelope sequence ^W(0), ^W(1), . . . , ^W(N−1) defined by the following formula (2), in other words, a coefficient which smoothes the amplitude spectral envelope sequence.
                              [                      Formula            ⁢                                                  ⁢            1                    ]                ⁢                                                                                                                          W              ^                        γ                    ⁡                      (            k            )                          =                              1                                          2                ⁢                π                                              ⁢                      1                                                        1                +                                                      ∑                                          n                      =                      1                                        p                                    ⁢                                                                                    α                        ^                                            n                                        ⁢                                          γ                      n                                        ⁢                                          exp                      ⁡                                              (                                                                              -                            j                                                    ⁢                                                                                                          ⁢                          2                          ⁢                          π                          ⁢                                                                                                          ⁢                                                      kn                            /                            N                                                                          )                                                                                                                                                                  (        1        )                                                      W            ^                    ⁡                      (            k            )                          =                              1                                          2                ⁢                π                                              ⁢                      1                                                        1                +                                                      ∑                                          n                      =                      1                                        p                                    ⁢                                                                                    α                        ^                                            n                                        ⁢                                          exp                      ⁡                                              (                                                                              -                            j                                                    ⁢                                                                                                          ⁢                          2                          ⁢                          π                          ⁢                                                                                                          ⁢                                                      kn                            /                            N                                                                          )                                                                                                                                                                  (        2        )            
The generated smoothed amplitude spectral envelope sequence ^Wγ(0), ^Wγ(1), . . . , ^Wγ(N−1) is outputted to the envelope normalizing portion 14.
<Envelope Normalizing Portion 14>
The MDCT coefficient sequence X(0), X(1), . . . , X(N−1) generated by the frequency domain transforming portion 11 and the smoothed amplitude spectral envelope sequence ^Wγ(0), ^Wγ(1), . . . , ^Wγ(N−1) outputted by the amplitude spectral envelope sequence generating portion 13 are inputted to the envelope normalizing portion 14.
The envelope normalizing portion 14 generates a normalized MDCT coefficient sequence XN(0), XN(1), . . . , XN(N−1) by normalizing each coefficient X(k) of the MDCT coefficient sequence by a corresponding value ^Wγ(k) of the smoothed amplitude spectral envelope sequence. That is, XN(k)=X(k)/^Wγ(k) [k=0, 1, . . . , N−1] is satisfied.
The generated normalized MDCT coefficient sequence XN(0), XN(1), . . . , XN(N−1) is outputted to the encoding portion 15.
Here, in order to realize such quantization that auditory distortion is reduced, the envelope normalizing portion 14 normalizes the MDCT coefficient sequence X(0), X(1), . . . , X(N−1) in frames, using the smoothed amplitude spectral envelope sequence ^Wγ(0), ^Wγ(1), . . . , ^Wγ(N−1), which is a sequence of a smoothed amplitude spectral envelope.
<Encoding Portion 15>
The normalized MDCT coefficient sequence XN(0), XN(1), . . . , XN(N−1) generated by the envelope normalizing portion 14 is inputted to the encoding portion 15.
The encoding portion 15 generates codes corresponding to the normalized MDCT coefficient sequence XN(0), XN(1), . . . , XN(N−1).
The generated codes corresponding to normalized MDCT coefficient sequence XN(0), XN(1), . . . , XN(N−1) are outputted to the decoding apparatus.
The encoding portion 15 divides coefficients of the normalized MDCT coefficient sequence XN(0), XN(1), . . . , XN(N−1) by a gain (global gain) g, and causes codes obtained by encoding a quantized normalized coefficient sequence XQ(0), XQ(1), . . . , XQ(N−1), which is a sequence of integer values obtained by quantizing results of the division, to be integer signal codes. In a technique of Non-Patent Literature 1, the encoding portion 15 decides such a gain g that the number of bits of the integer signal codes is equal to or smaller than the number of allocated bits B, which is the number of bits allocated in advance, and is as large as possible. Then, the encoding portion 15 generates a gain code corresponding to the decided gain g and an integer signal code corresponding to the decided gain g.
The generated gain code and integer signal codes are outputted to the decoding apparatus as codes corresponding to the normalized MDCT coefficient sequence XN(0), XN(1), . . . , XN(N−1).
[Specific Example of Encoding Process Performed by Encoding Portion 15]
A specific example of the encoding process performed by the encoding portion 15 will be described.
FIG. 2 shows configuration example of the specific example of the encoding portion 15. As shown in FIG. 2, the encoding portion 15 is provided with a gain acquiring portion 151, a quantizing portion 152, a Rice parameter deciding portion 153, a Golomb-Rice encoding portion 154, a gain encoding portion 155, a judging portion 156 and a gain updating portion 157.
Each portion in FIG. 2 will be described below.
<Gain Acquiring Portion 151>
The gain acquiring portion 151 decides such a global gain g that the number of bits of integer signal codes is equal to or smaller than the number of allocated bits B, which is the number of bits allocated in advance, and is as large as possible from an inputted normalized MDCT coefficient sequence XN(0), XN(1), . . . , XN(N−1) and outputs the global gain g. The global gain g obtained by the gain acquiring portion 151 becomes an initial value of a global gain used by the quantizing portion 152.
<Quantizing Portion 152>
The quantizing portion 152 obtains and outputs a quantized normalized coefficient sequence XQ(0), XQ(1), . . . , XQ(N−1) as a sequence of an integer part of a result of dividing each coefficient of the inputted normalized MDCT coefficient sequence XN(0), XN(1), . . . , XN(N−1) by the global gain g obtained by the gain acquiring portion 151 or the gain updating portion 157.
Here, a global gain g used when the quantizing portion 152 is executed for the first time is the global gain g obtained by the gain acquiring portion 151, that is, the initial value of the global gain. Further, a global gain g used when the quantizing portion 152 is executed at and after the second time is the global gain g obtained by the gain updating portion 157, that is, an updated value of the global gain.
<Rice Parameter Deciding Portion 153>
The Rice parameter deciding portion 153 obtains and outputs Rice parameters r by the following formula (3) from the quantized normalized coefficient sequence XQ(0), XQ(1), . . . , XQ(N−1) obtained by the quantizing portion 152.
                              [                      Formula            ⁢                                                  ⁢            2                    ]                ⁢                                                                                      r        =                  max          (                                    [                                                log                  2                                (                                                      (                                          ln                      ⁢                                                                                          ⁢                      2                                        )                                    ⁢                                      1                    N                                    ⁢                                                            ∑                                              k                        =                        0                                                                    N                        -                        1                                                              ⁢                                                                                                                  X                          Q                                                ⁡                                                  (                          k                          )                                                                                                                                          )                            ]                        ,            0                    )                                    (        3        )            
It is assumed that “●” indicates an arbitrary number, and [●] indicates a rounding operation for “●”.
<Golomb-Rice Encoding Portion 154>
The Golomb-Rice encoding portion 154 performs Golomb-Rice encoding of the quantized normalized coefficient sequence XQ(0), XQ(1), . . . , XQ(N−1) obtained by the quantizing portion 152, using the Rice parameters r obtained by the Rice parameter deciding portion 153, to obtain integer signal codes, and outputs the integer signal codes and the number of consumed bits C, which is the number of bits of the integer signal codes.
<Judging Portion 156>
When the number of times of updating the gain is a predetermined number of times, the judging portion 156 outputs the integer signal codes as well as outputting an instruction signal to encode the global gain g obtained by the gain updating portion 157 to the gain encoding portion 155, and, when the number of times of updating the gain is smaller than the predetermined number of times, the judging portion 156 outputs the number of consumed bits C measured by the Golomb-Rice encoding portion 154 to the gain updating portion 157.
<Gain Updating Portion 157>
When the number of consumed bits C measured by the Golomb-Rice encoding portion 154 is larger than the number of allocated bits B, the gain updating portion 157 updates the value of the global gain g to a larger value and outputs the value. When the number of consumed bits C is smaller than the number of allocated bits B, the gain updating portion 157 updates the value of the global gain g to a smaller value and outputs the updated value of the global gain g.
<Gain Encoding Portion 155>
The gain encoding portion 155 encodes the global gain g obtained by the gain updating portion 157 in accordance with the instruction signal outputted by the judging portion 156 to obtain and output a gain code.
The integer signal codes outputted by the judging portion 156 and the gain code outputted by the gain encoding portion 155 are outputted to the decoding apparatus as codes corresponding to the normalized MDCT coefficient sequence.
As described above, in the conventional TCX-based encoding, an MDCT coefficient sequence is normalized with the use of a smoothed amplitude spectral envelope sequence obtained by smoothing an amplitude spectral envelope, and, after that, the normalized MDCT coefficient sequence is encoded. This encoding method is adopted in the MPEG-4 USAC described above.