1. Field of the Invention
The present invention relates to an excitation signal coding apparatus and an excitation signal coding method.
2. Description of the Related Art
Conventionally, a CELP (code excitation linear prediction) speech coding scheme generally uses, as an excitation coding means, a configuration provided with two types of excitation codebooks; a codebook whose contents are adaptively changed based on past output and a codebook with predetermined fixed contents, excitation signals being coded with the sum of vectors output from these two types of codebooks.
More specifically, it is a general practice that an adaptive codebook which is a buffer of coding excitation vectors generated in the past is used as the codebook of the former and an algebraic codebook, noise codebook or random codebook, etc., is used as the codebook of the latter. Here, the codebook of the former will be referred to as an “adaptive codebook” and the codebook of the latter will be referred to as a “fixed codebook.”
FIG. 1 illustrates a general CELP coding model. A first excitation vector yi is output from an adaptive codebook 1, multiplied by g1 by a multiplier 2 and input to an adder 3. A second excitation vector zj is output from a fixed codebook 4, multiplied by g2 by a multiplier 5 and input to the adder 3. The adder 3 adds up the first excitation vector yi multiplied by g1 and the second excitation vector zj multiplied by g2, outputs the addition result to a synthesis filter 6 and the adaptive codebook 1 simultaneously.
The excitation vector (g1yi+g2zj) output to the adaptive codebook 1 is used to update the adaptive codebook 1. The synthesis filter 6 uses a separately input quantized quantization linear predictive coefficient ap and the excitation vector input from the adder 3 to combine a synthesized speech signal s according to the following Expression (1). In Expression (1), L denotes a vector length (subframe length)
                                          s            ⁡                          (              n              )                                =                                                    ∑                                  p                  =                  1                                P                            ⁢                                                a                  p                                ⁢                                  s                  ⁡                                      (                                          n                      -                      p                                        )                                                                        +                          (                                                                    g                    1                                    ⁢                                                            y                      i                                        ⁡                                          (                      n                      )                                                                      +                                                      g                    2                                    ⁢                                                            z                      j                                        ⁡                                          (                      n                      )                                                                                  )                                      ,                                  ⁢                  n          =          0                ,        1        ,        2        ,        ⋯        ⁢                                  ,                  L          -          1                                    (        1        )            
The synthesized speech signal s output from the synthesis filter 6 is input to an adder 7. The adder 7 calculates an error between the synthesized speech signal s and the input speech signal and outputs the error to a weighting filter 8. The weighting filter 8 carries out perceptual weighting on the error signal input from the adder 7 and outputs the weighted signal. Here, for the second excitation vector output from the fixed codebook 4, pitch synchronization processing is generally carried out when a pitch period is shorter than the vector length, and such processing is processing expressed, for example, by zj(n)=zj(n)+β×zj(n−T) (β denotes a period gain factor, T denotes a pitch period), but this processing is omitted here.
Next, it is an excitation search that determines a first excitation vector yi, second excitation vector zj, first excitation vector gain g1 and second excitation vector gain g2 so as to minimize the perceptual weighted error signal output from the weighting filter 8 and more specifically the excitation search is carried out in the processing flow shown in FIG. 2.
First, in step S101, an adaptive codebook search (selection of a first excitation vector) is performed. This adaptive codebook search is performed so as to minimize the perceptual weighted error signal without using the fixed codebook. A more specific expression is Expression (2) shown in step S101 in FIG. 2 and the first excitation vector yi and first excitation vector gain g1 which minimize this value are determined. In Expression (2), x denotes a target vector, g1 denotes a first excitation vector gain, H denotes a filter impulse response convolutional matrix and yi denotes a first excitation vector.
More specifically, the first excitation vector yi is determined by maximizing Expression (3) shown below and the first excitation vector gain g1 at this time is expressed by Expression (4) shown below. In Expressions (3), (4), Yi denotes a perceptual weighted synthesized speech signal obtained by convoluting an impulse response h of a filter which is a cascaded filter of the synthesis filter 6 and weighting filter 8 into the first excitation vector yi and x is a target vector (vector which becomes a target for a signal combined from excitation vectors, and when the vector combined from the excitation vectors matches this vector, this means that the input speech signal matches the synthesized speech signal) obtained by subtracting the signal resulting from a zero-input response of the synthesis filter 6 passed through the weighting filter 8 from the output signal (perceptual weighting speech signal) when an input speech signal is input to the weighting filter 8. The first excitation vector gain g1 may be quantized or coded here or may also be quantized or coded through simultaneous optimization of the first excitation vector gain g1 and second excitation vector gain g2 after a fixed codebook search is completed in next step S102. The way of quantization/coding is not particularly limited here.
                                          (                                          ∑                                  n                  =                  0                                                  L                  -                  1                                            ⁢                                                x                  ⁡                                      (                    n                    )                                                  ⁢                                                      Y                    i                                    ⁡                                      (                    n                    )                                                                        )                    2                                      ∑                          n              =              0                                      L              -              1                                ⁢                                                    Y                i                            ⁡                              (                n                )                                      ⁢                                          Y                i                            ⁡                              (                n                )                                                                        (        3        )                                                      ∑                          n              =              0                                      L              -              1                                ⁢                                    x              ⁡                              (                n                )                                      ⁢                                          Y                i                            ⁡                              (                n                )                                                                          ∑                          n              =              0                                      L              -              1                                ⁢                                                    Y                i                            ⁡                              (                n                )                                      ⁢                                          Y                i                            ⁡                              (                n                )                                                                        (        4        )            
Next, in step S102, a fixed codebook search (selection of second excitation vector zj) is performed. Here, in combination with the already determined first excitation vector yi, a second excitation vector zj and second excitation vector gain g2 are determined so as to minimize the error relative to the target vector x. A specific expression is Expression (5) shown in step S102 in FIG. 2 and zj and g2 are determined so as to minimize this value. In Expression (5), g2 denotes a second excitation vector gain and zj denotes a second excitation vector.
Minimizing Expression (2) determines the first excitation vector yi and the first excitation vector gain g1, and therefore a cross-correlation between the target vector x and second excitation vector zj normally diminishes. Such tendency is noticeable in the case of a periodic signal in particular.
As shown in FIG. 1, since the adaptive codebook for generating the first excitation vector yi is the buffer of the excitation vectors generated in the past, when the contents of the adaptive codebook as the buffer differ from the original contents due to transmission errors or frame loss, etc., the adaptive codebook cannot generate correct first excitation vector yi even if correct excitation coding information is received. On the other hand, the second excitation vector zj is generated correctly if a correct code is received, but when a periodic signal is coded as described above, the coding (codebook search) is performed so that there is no strong correlation between the second excitation vector zi and target vector x, and therefore it is not possible to generate a signal close to the target vector x, which causes the influence of an error to be propagated for a long time.
To solve this problem, an adaptive codebook in a different configuration which is less affected by errors is conventionally proposed.
For example, the Unexamined Japanese Patent Publication No. HEI 5-73097 adopts a configuration which generates a first excitation vector by adding up vectors extracted from a plurality of past points in time. Even if part of the buffer does not constitute a correct signal due to influences of transmission path errors, using vectors extracted from a plurality of past points in time can reduce the influences of errors with the presence of the vectors extracted from different points in time free of the influences of errors.
Furthermore, the Japanese Patent Publication No. 2700974 and K. Mano et al, “Design of a pitch synchronous innovation CELP coder for mobile communications,” (IEEE Journal on Selected Areas in Communications, vol. 13, issue 1, January 1995 pp. 31-41) adopt a configuration of switching between adaptive codebook and fixed codebook (used here instead of an adaptive codebook, but different in meaning from the fixed codebook used in the present application) and achieving the effect of resetting the adaptive codebook, which can suppress erroneous propagation compared to a case where no switching is performed.
Furthermore, for example, C. Montminy and T. Abulasr: “Improving the performance of ITU-T G.729A for VoIP” (proc. IEEE ICME2000, pp. 433-436 (2000)) also studies the feasibility of suppressing erroneous propagation by periodically resetting the contents of an adaptive codebook without changing any coding algorithm.
However, changing the configuration of the adaptive codebook itself as described above may cause an amount of memory, amount of calculation or the scale of a program, etc., to increase. Furthermore, when a specific algorithm defined by standards, etc., needs to be used, it is not possible to change the configuration of the adaptive codebook itself to improve the error characteristic as described above. Though the technique described in Non-Patent Document 2 has no such problems, but has considerable deterioration of quality after reset and fails to achieve effects in frames which are not reset.