A CELP (Code Excited Linear Prediction) type speech coder executes linear prediction for each of frames obtained by segmenting a speech at a given time, and codes predictive residuals (excitation signals) resulting from the frame-by-frame linear prediction, using an adaptive codebook having old excitation vectors stored therein and a random codebook which has a plurality of random code vectors stored therein. For instance, "Code-Excited Linear Prediction(CELP): High-Quality Speech at Very Low Bit Rate," M. R. Schroeder, Proc. ICASSP '85, pp. 937-940 discloses a CELP type speech coder.
FIG. 1 illustrates the schematic structure of a CELP type speech coder. The CELP type speech coder separates vocal information into excitation information and vocal tract information and codes them. With regard to the vocal tract information, an input speech signal 10 is input to a filter coefficients analysis section 11 for linear prediction and linear predictive coefficients (LPCs) are coded by a filter coefficients quantization section 12. Supplying the linear predictive coefficients to a synthesis filter 13 allows vocal tract information to be added to excitation information in the synthesis filter 13. With regard to the excitation information, excitation vector search in an adaptive codebook 14 and a random codebook 15 is carried out for each segment obtained by further segmenting a frame (called subframe). The search in the adaptive codebook 14 and the search in the random codebook 15 are processes of determining the code number and gain (pitch gain) of an adaptive code vector, which minimizes coding distortion in an equation 1, and the code number and gain (random code gain) of a random code vector. EQU .parallel.v-(gaHp+gcHc).parallel..sup.2 (1)
v: speech signal (vector) PA1 H: impulse response convolution matrix of the ##EQU1## PA1 h: impulse response (vector) of the synthesis filter PA1 L: frame length PA1 p: adaptive code vector PA1 c: random code vector PA1 ga: adaptive code gain (pitch gain) PA1 gc: random code gain PA1 x: target (vector) for the random codebook search PA1 v: speech signal (vector) PA1 H: impulse response convolution matrix H of the synthesis filter PA1 p: adaptive code vector PA1 ga: adaptive code gain (pitch gain) PA1 x: target (vector) for the random codebook search PA1 H: impulse response convolution matrix of the synthesis filter PA1 c: random code vector PA1 gc: random code gain. PA1 x: target (vector) for the random codebook search PA1 H: impulse response convolution matrix of the synthesis filter PA1 H.sup.t : transposed matrix of H PA1 X.sup.t : time reverse synthesis of x using H (x'.sup.t =x.sup.t H) PA1 c: random code vector.
synthesis filter.
where
Because a closed loop search of the code that minimizes the equation 1 involves a vast amount of computation for the code search, however, an ordinary CELP type speech coder first performs adaptive codebook search to specify the code number of an adaptive code vector, and then executes random codebook search based on the searching result to specify the code number of a random code vector.
The speech coder search by the CELP type speech coder will now be explained with reference to FIGS. 2A through 2C. In the figures, a code x is a target vector for the random codebook search obtained by an equation 2. It is assumed that the adaptive codebook search has already been accomplished. EQU x=v-gaHp (2)
where
The random codebook search is a process of specifying a random code vector c which minimizes coding distortion that is defined by an equation 3 in a distortion calculator 16 as shown in FIG. 2A. EQU .parallel.x-gcHc.parallel..sup.2 (3)
where
The distortion calculator 16 controls a control switch 21 to switch a random code vector to be read from the random codebook 15 until the random code vector c is specified.
An actual CELP type speech coder has a structure in FIG. 2B to reduce the computational complexities, and a distortion calculator 16' carries out a process of specifying a code number which maximizes a distortion measure in an equation 4. ##EQU2##
where
Specifically, the random codebook control switch 21 is connected to one terminal of the random codebook 15 and the random code vector c is read from an address corresponding to that terminal. The read random code vector c is synthesized with vocal tract information by the synthesis filter 13, producing a synthesized vector Hc. Then, the distortion calculator 16' computes a distortion measure in the equation 4 using a vector x' obtained by a time reverse process of a target x, the vector Hc resulting from synthesis of the random code vector in the synthesis filter and the random code vector c. As the random codebook control switch 21 is switched, computation of the distortion measure is performed for every random code vector in the random codebook.
Finally, the number of the random codebook control switch 21 that had been connected when the distortion measure in the equation 4 became maximum is sent to a code output section 17 as the code number of the random code vector.
FIG. 2C shows a partial structure of a speech decoder. The switching of the random codebook control switch 21 is controlled in such a way as to read out the random code vector that has a transmitted code number. After a transmitted random code gain gc and filter coefficient are set in an amplifier 23 and a synthesis filter 24, a random code vector is read out to restore a synthesized speech.
In the above-described speech coder/speech decoder, the greater the number of random code vectors stored as excitation information in the random codebook 15 is, the more possible it is to search a random code vector close to the excitation vector of an actual speech. As the capacity of the random codebook (ROM) is limited, however, it is not possible to store countless random code vectors corresponding to all the excitation vectors in the random codebook. This restricts improvement on the quality of speeches.
Also has proposed an algebraic excitation which can significantly reduce the computational complexities of coding distortion in a distortion calculator and can eliminate a random codebook (ROM) (described in "8 KBIT/S ACELP CODING OF SPEECH WITH 10 MS SPEECH-FRAME: A CANDIDATE FOR CCITT STANDARDIZATION": R. Salami, C. Laflamme, J-P. Adoul, ICASSP '94, pp. II-97 to II-100, 1994).
The algebraic excitation considerably reduces the complexities of computation of coding distortion by previously computing the results of convolution of the impulse response of a synthesis filter and a time-reversed target and the autocorrelation of the synthesis filter and developing them in a memory. Further, a ROM in which random code vectors have been stored is eliminated by algebraically generating random code vectors. A CS-ACELP and ACELP which use the algebraic excitation have been recommended respectively as G. 729 and G. 723.1 from the ITU-T.
In the CELP type speech coder/speech decoder equipped with the above-described algebraic excitation in a random codebook section, however, a target for a random codebook search is always coded with a pulse sequence vector, which puts a limit to improvement on speech quality.