The present invention relates to an excitation vector generator capable of obtaining a high-quality synthesized speech, and a speech coder and a speech decoder which can code and decode a high-quality speech signal at a low bit rate.
A CELP (Code Excited Linear Prediction) type speech coder executes linear prediction for each of frames obtained by segmenting a speech at a given time, and codes predictive residuals (excitation signals) resulting from the frame.-by-frame linear prediction, using an adaptive codebook having old excitation vectors stored therein and a random codebook which has a plurality of random code vectors stored therein. For instance, xe2x80x9cCode-Excited Linear Prediction(CELP):High-Quality Speech at Very Low Bit Rate,xe2x80x9d M. R. Schroeder, Proc. ICASSP ""85, pp. 937-940 discloses a CELP type speech coder.
FIG. 1 illustrates the schematic structure of a CELP type speech coder. The CELP type speech coder separates vocal information into excitation information and vocal tract information and codes them. With regard to the vocal tract information, an input speech signal 10 is input to a filter coefficients analysis section 11 for linear prediction and linear predictive coefficients (LPCs) are coded by a filter coefficients quantization section 12. Supplying the linear predictive coefficients to a synthesis filter 13 allows vocal tract information to be added to excitation information in the synthesis filter 13. With regard to the excitation information, excitation vector search in an adaptive codebook 14 and a random codebook 15 is carried out for each segment obtained by further segmenting a frame (called subframe). The search in the adaptive codebook 14 and the search in the random codebook 15 are processes of determining the code number and gain (pitch gain) of an adaptive code vector, which minimizes coding distortion in an equation 1, and the code number and gain (random code gain) of a random code vector.
∥vxe2x88x92(gaHp+gcHc)∥2xe2x80x83xe2x80x83(1)
V: speech signal (vector)
H: impulse response convolution matrix of the   H  =      [                                        h            ⁡                          (              0              )                                                0                          ⋯                          ⋯                          0                          0                                                  h            ⁡                          (              1              )                                                            h            ⁡                          (              0              )                                                0                          ⋯                          0                          0                                                  h            ⁡                          (              2              )                                                            h            ⁡                          (              1              )                                                            h            ⁡                          (              0              )                                                0                          0                          0                                      ⋮                          ⋮                          ⋮                          ⋰                          0                          0                                      ⋮                          ⋮                          ⋮                          ⋰                                      h            ⁡                          (              0              )                                                0                                                  h            ⁡                          (                              L                -                1                            )                                                ⋯                          ⋯                          ⋯                                      h            ⁡                          (              1              )                                                            h            ⁡                          (              0              )                                            ]  
xe2x80x83synthesis filter.
where h: impulse response (vector) of the synthesis filter
L: frame length
p: adaptive code vector
c: random code vector
ga: adaptive code gain (pitch gain)
gc: random code gain
Because a closed loop search of the code that minimizes the equation 1 involves a vast amount of computation for the code search, however, an ordinary CELP type speech coder first performs adaptive codebook search to specify the code number of an adaptive code vector, and then executes random codebook search based on the searching result to specify the code number of a random code vector.
The speech coder search by the CELP type speech coder will now be explained with reference to FIGS. 2A through 2C. In the figures, a code x is a target vector for the random codebook search obtained by an equation 2. It is assumed that the adaptive codebook search has already been accomplished.
x=vxe2x88x92gaHpxe2x80x83xe2x80x83(2)
where x: target (vector) for the random codebook search
V: speech signal (vector)
H: impulse response convolution matrix H of the synthesis filter
p: adaptive code vector
ga: adaptive code gain (pitch gain)
The random codebook search is a process of specifying a random code vector c which minimizes coding distortion that is defined by an equation 3 in a distortion calculator 16 as shown in FIG. 2A.
∥xxe2x88x92gcHc∥2xe2x80x83xe2x80x83(3)
where x: target (vector) for the random codebook search
H: impulse response convolution matrix of the synthesis filter
c: random code vector
gc: random code gain.
The distortion calculator 16 controls a control switch 21 to switch a random code vector to be read from the random codebook 15 until the random code vector c is specified.
An actual CELP type speech coder has a structure in FIG. 2B to reduce the computational complexities, and a distortion calculator 16xe2x80x2 carries out a process of specifying a code number which maximizes a distortion measure in an equation 4.                                                         (                                                x                  t                                ⁢                Hc                            )                        2                                              "LeftBracketingBar"              "RightBracketingBar"                        ⁢            Hc            ⁢                                          "LeftBracketingBar"                "RightBracketingBar"                            2                                      =                                                            (                                                      (                                                                  x                        t                                            ⁢                      H                                        )                                    ⁢                  c                                )                            2                                                      "LeftBracketingBar"                "RightBracketingBar"                            ⁢              Hc              ⁢                                                "LeftBracketingBar"                  "RightBracketingBar"                                2                                              =                                                                      (                                                            x                                              xe2x80x2                        ⁢                                                  xe2x80x83                                                ⁢                        t                                                              ⁢                    c                                    )                                2                                                              "LeftBracketingBar"                  "RightBracketingBar"                                ⁢                Hc                ⁢                                                      "LeftBracketingBar"                    "RightBracketingBar"                                    2                                                      =                                                            (                                                            x                                              xe2x80x2                        ⁢                                                  xe2x80x83                                                ⁢                        t                                                              ⁢                    c                                    )                                2                                                              c                  t                                ⁢                                  H                  t                                ⁢                Hc                                                                        (        4        )            
where x: target (vector) for the random codebook search
H: impulse response convolution matrix of the synthesis filter
Ht: transposed matrix of H
Xt: time reverse synthesis of x using H (xxe2x80x2t=xtH)
c: random code vector.
Specifically, the random codebook control switch 21 is connected to one terminal of the random codebook 15 and the random code vector c is read from an address corresponding to that terminal. The read random code vector c is synthesized with vocal tract information by the synthesis filter 13, producing a synthesized vector Hc. Then, the distortion calculator 16xe2x80x2 computes a distortion measure in the equation 4 using a vector xxe2x80x2 obtained by a time reverse process of a target x, the vector Hc resulting from synthesis of the random code vector in the synthesis filter and the random code vector c. As the random codebook control switch 21 is switched, computation of the distortion measure is performed for every random code vector in the random codebook.
Finally, the number of the random codebook control switch 21 that had been connected when the distortion measure in the equation 4 became maximum is sent to a code output section 17 as the code number of the random code vector.
FIG. 2C shows a partial structure of a speech decoder. The switching of the random codebook control switch 21 is controlled in such a way as to read out the random code vector that has a transmitted code number. After a transmitted random code gain gc and filter coefficient are set in an amplifier 23 and a synthesis filter 24, a random code vector is read out to restore a synthesized speech.
In the above-described speech coder/speech decoder, the greater the number of random code vectors stored as excitation information in the random codebook 15 is, the more possible it is to search a random code vector close to the excitation vector of an actual speech. As the capacity of the random codebook (ROM) is limited, however, it is not possible to store countless random code vectors corresponding to all the excitation vectors in the random codebook. This restricts improvement on the quality of speeches.
Also has proposed an algebraic excitation which can significantly reduce the computational complexities of coding distortion in a distortion calculator and can eliminate a random codebook (ROM) (described in xe2x80x9c8 KBIT/S ACELP CODING OF SPEECH WITH 10 MS SPEECH-FRAME: A CANDIDATE FOR CCITT STANDARDIZATIONxe2x80x9d: R. Salami, C. Laflamme, J-P. Adoul, ICASSP ""94, pp. II-97 to II-100, 1994).
The algebraic excitation considerably reduces the complexities of computation of coding distortion by previously computing the results of convolution of the impulse response of a synthesis filter and a time-reversed target and the autocorrelation of the synthesis filter and developing them in a memory. Further, a ROM in which random code vectors have been stored is eliminated by algebraically generating random code vectors. A CS-ACELP and ACELP which use the algebraic excitation have been recommended respectively as G. 729 and G. 723.1 from the ITU-T.
In the CELP type speech coder/speech decoder equipped with the above-described algebraic excitation in a random codebook section, however, a target for a random codebook search is always coded with a pulse sequence vector, which puts a limit to improvement on speech quality.
It is therefore a primary object of the present invention to provide an excitation vector generator, a speech coder and a speech decoder, which can significantly suppress the memory capacity as compared with a case where random code vectors are stored directly in a random codebook, and can improve the speech quality
It is a secondary object of this invention to provide an excitation vector generator, a speech coder and a speech decoder, which can generate complicated random code vectors as compared with a case where an algebraic excitation is provided in a random codebook section and a target for a random codebook search is coded with a pulse sequence vector, and can improve the speech quality.
In this invention, the fixed code vector reading section and fixed codebook of a conventional CELP type speech coder/decoder are respectively replaced with an oscillator, which outputs different vector sequences in accordance with the values of input seeds, and a seed storage section which stores a plurality of seeds (seeds of the oscillator). This eliminates the need for fixed code vectors to be stored directly in a fixed codebook (ROM) and can thus reduce the memory capacity significantly.
Further, according to this invention, the random code vector reading section and random codebook of the conventional CELP type speech coder/decoder are respectively replaced with an oscillator and a seed storage section. This eliminates the need for random code vectors to be stored directly in a random codebook (ROM) and can thus reduce the memory capacity significantly.
The invention is an excitation vector generator which is so designed as to store a plurality of fixed waveforms, arrange the individual fixed waveforms at respective start positions based on start position candidate information and add those fixed waveforms to generate an excitation vector. This can permit an excitation vector close to an actual speech to be generated.
Further, the invention is a CELP type speech coder/decoder constructed by using the above excitation vector generator as a random codebook. A fixed waveform arranging section may algebraically generate start position candidate information of fixed waveforms.
Furthermore, the invention is a CELP type speech coder/decoder, which stores a plurality of fixed waveforms, generates an impulse with respect to start position candidate information of each fixed waveform, convolutes the impulse response of a synthesis filter and each fixed waveform to generate an impulse response for each fixed waveform, computes the autocorrelations and correlations of impulse responses of the individual fixed waveforms and develop them in a correlation matrix. This can provide a speech coder/decoder which improves the quality of a synthesized speech at about the same computation cost as needed in a case of using an algebraic excitation as a random codebook.
Moreover, this invention is a CELP type speech coder/decoder equipped with a plurality of random codebooks and switch means for selecting one of the random codebooks. At least one random codebook may be the aforementioned excitation vector generator, or at least one random codebook may be a vector storage section having a plurality of random number sequences stored therein or a pulse sequences storage section having a plurality of random number sequences stored therein, or at least two random codebooks each having the aforementioned excitation vector generator may be provided with the number of fixed waveforms to be stored differing from one random codebook to another, and the switch means selects one of the random codebooks so as to minimize coding distortion at the time of searching a random codebook or adaptively selects one random codebook according to the result of analysis of speech segments.