1. Field of the Invention
This invention relates to a speech coding apparatus, and more particularly to a speech coding apparatus employing code-excited linear predictive coding (CELP) or a like system which codes a speech signal at a low bit rate with a high quality.
2. Description of the Related Art
In recent years, application of digital systems to land mobile telephones and cordless telephones which employ radio waves as a medium has been and is proceeding rapidly. Since the frequency band which can be used for telephones of the type mentioned is limited in radio waves, in order to reduce an occupied band, it is important to develop a coding system for a speech signal of a low bit rate.
As one of coding systems of the type mentioned wherein the bit rate ranges approximately from 8 to 4 kb/s, a CELP system is known which is disclosed, for example, in M. R. Schroeder and B. S. Atal, "Code-excited linear prediction (CELP): High quality speech at low bit rates", ICASSP Proceedings 85, 1985, America, pp.937-940 (hereinafter referred to as document 1).
In the CELP system as a conventional speech coding apparatus disclosed in the document 1, coding processing is performed on the transmission side in the following procedure. First, for each frame (for example, 20 ms), a short-term predictive code representative of a frequency characteristic of speech, that is, a spectrum parameter, is extracted from a speech signal of an object of coding (short-term prediction). Then, each frame is divided into sub frames of a shorter period (for example, 5 ms). Then, for each sub frame, a pitch parameter representative of a long-term correlation (pitch correlation) is extracted from speech excitation signals in the past, and the speech signal of the sub frame is long-term predicted with the pitch parameter. The long-term prediction is performed by determining a delay code representative of a pitch correlation using an adaptive code book which includes speech excitation signals of a sub frame length, that is, adaptive code vectors, obtained by delaying speech excitation signals in the past by intervals corresponding to delay samples corresponding to delay codes of the speech excitation signals. The delay code is determined in the following procedure. In particular, a delay code is varied (attempted) by sizes of the adaptive code book to extract adaptive code vectors corresponding to the resulting delay codes. A synthesis signal is produced using the thus extracted adaptive code vectors, and an error power of the synthesis signal from the speech signal is calculated. An optimum delay code with which the thus calculated error power exhibits the lowest value, an adaptive code vector which corresponds to the optimum delay code and gains for them are determined.
Then, a speech excitation code vector with which the error power between a noise signal which is a quantization code of a kind prepared in advance, that is, a synthesis signal produced from an excitation code vector extracted from a speech excitation code book and a residual signal obtained by long-term prediction exhibits the lowest value and a gain for the speech excitation code vector are determined. This processing will be hereinafter referred to as speech excitation code book search.
Indices representative of the kinds of the adaptive code vector and the speech excitation code vector and the gains for the individual speech excitation signals determined in such a manner as described above as well as an index representative of the type of a spectral parameter are transmitted.
The search for a delay code of an adaptive code vector and a quantization code of an excitation code vector is specifically performed in the following procedure. First, in order to reduce quantization noises of filter coefficients of a synthesis filter formed from a spectrum parameter determined by a short-term predictive code and quantized/dequantized, a speech signal x[n] inputted is multiplied by a perceptual weighting filter W(z) defined by the following equation: EQU W(z)={A(z/.gamma.1)}/{A(z/.gamma.2)} (1)
where A(z) are filters representing the opposite characteristics to those of the synthesis filter described above, and .gamma.1 and .gamma.2 are weighting coefficients representing characteristics of the perceptual weighting filter.
Then, a weighting synthesis filter HV wherein the synthesis filter 1/A(z) and the perceptual weighting filter W(z) are connected in cascade connection is driven with a code vector ej[n] of a quantization code j to calculate a synthesis signal Hej[n]. Thereafter, the quantization code j with which the error power E between a signal z[n] and the signal Hej[n] exhibits the lowest value in the following equation is determined: ##EQU1## where Ns is a sub frame length, H is a matrix which realizes the synthesis filter, and g.sub.ej is a gain of the code vector ej.
Since the weighting coefficients .gamma.1 and .gamma.2 are usually set to .gamma.1=1.0 and .gamma.2=0.8, respectively, the characteristic of the weighting synthesis filter HV is given by the following equation: EQU HV=1/A(z/0.8)
A weighting synthesis filter having the characteristic is used commonly.
In this instance, since the weighting synthesis filter HV for a code book search is of the full pole type and one of two terms of an object of calculation is a constant, the calculation amount for the calculation (number of times of product-summing) is not very great. Where the calculation is performed with a common digital signal processor (DSP) which includes a RAM and a ROM and has a data point for each of the RAM and the ROM, constants of the data points are stored in the ROM while variables are stored in the RAM to perform a predetermined calculation.
FIG. 4 shows a conventional speech coding apparatus. Referring to FIG. 4, the speech coding apparatus shown includes a coding section 1 for coding a speech input signal, a decoding section 2 for decoding the coded signal, and a transmission line 3 for interconnecting the decoding section 2 and the coding section 1.
The coding section 1 includes a buffer circuit 11 for storing a speech signal SI inputted from an input terminal TI and outputting a speech signal S, a short-term prediction circuit 12 for extracting an LPC coefficient which is a spectrum parameter of speech, a parameter quantization circuit 13 for quantizing the LPC coefficient to produce a short-term predictive code CL, a weighting circuit 14 for perceptual weighting the speech signal S and outputting a weighted speech signal SW, an adaptive code book 15 for storing excitations in the past, a long-term prediction circuit 16 for searching for an adaptive code vector which is a delay code representative of a pitch correlation, an excitation code book 17 in which excitation code vectors of a sub frame length representative of a long-term predictive residual are stored, an excitation code book search circuit 18 for determining an optimum excitation code vector from the excitation code book 17, a gain code book 19 in which parameters representative of gain terms of an adaptive code vector and an excitation code vector are stored, a gain code book search circuit 40 for determining quantization gains of an adaptive code vector and an excitation code vector from the gain code book 19, and a multiplexer 41 for combining code trains and outputting the combination of code trains.
The decoding section 2 includes a demultiplexer 21 for decoding transmission codes supplied thereto into predetermined code trains, an adaptive code book 22 same as the adaptive code book 15, an excitation code book 23 same as the excitation code book 17, a gain code book 24 same as the gain code book 19, a synthesis filter 25 for regenerating a speech signal from an excitation produced and a speech synthesis filter, and an output terminal TO for outputting speech.
A flow of processes of the conventional speech coding circuit will be described with reference to FIG. 4. The coding section 1 receives a speech signal SI through the input terminal TI and stores the speech signal SI into the buffer circuit 11. Using the speech signal S of a fixed sample stored in the buffer circuit 11, the short-term prediction circuit 12 performs a short-term predictive analysis to calculate an LPC coefficient of the speech signal. The LPC coefficient thus calculated is quantized by the parameter quantization circuit 13, and the quantized code of the LPC coefficient, that is, a short-term predictive code CL, is sent to the multiplexer 41, and is dequantized so that it may be used for later coding processing.
Meanwhile, the speech signal S stored in the buffer circuit 11 is perceptual weighted by the weighting circuit 14 using a quantized/dequantized LPC coefficient CL and is thus supplied as a weighted speech signal SW to the long-term prediction circuit 16, the excitation code book search circuit 18 and the gain code book search circuit 40 so that it is used for a search of code books.
Then, using the adaptive code book 15, the excitation code book 17 and the gain code book 19, a search for code books of the signal SW is performed. First, long-term prediction is performed by the long-term prediction circuit 16 to determine an optimum delay code CD representative of a pitch correlation in such a manner as hereinafter described, and the delay code CD is transferred to the multiplexer 41. Further, the long-term prediction circuit 16 produces a corresponding adaptive code vector. Then, after subtraction of an influence of the adaptive code vector, the excitation code book search circuit 18 performs a search of the excitation code book 17 to determine a quantization code CS and produces an excitation code vector. The quantization code is transferred to the multiplexer 41. After the adaptive code vector and the excitation code vector are determined, the gain code book search circuit 40 refers to gain term data from the gain code book 19 to calculate the gains of the two excitations and transfers the code DG of them to the multiplexer 41. The multiplexer 41 combines the codes CL, CD, CS and CG into a transmission code CT and transfers the transmission code CT to the decoding section 2 through the transmission line 3.
In the decoding section 2, the demultiplexer 21 demultiplexes the transmission code CT inputted thereto from the transmission line 3 into codes CL, CD, CS and CG. The demultiplexer 21 decodes the short-term predictive code CL corresponding to an LPC coefficient into a filter coefficient and transfers the filter coefficient to the synthesis filter 25. From the delay code CD, an adaptive code vector is produced using the adaptive code book 22. From the quantization code CS corresponding to an excitation, an excitation code vector is produced using the excitation code book 23. From the code CG corresponding to gains, gains of the adaptive code vector and the excitation code vector are calculated referring to the gain code book 24, and the excitations are multiplied by the gain terms to produce an input signal to the synthesis filter 25. Finally, using the input signal, the synthesis filter 25 performs synthesis of a sound signal and outputs the sound signal from the output terminal T0.
Here, in order to realize the perceptual weighting filter W(z) by the weighting circuit 14, since the filter coefficient is variable, multiplication of variables is required as seen from the equation (1) given hereinabove. Consequently, a filter of the zero pole type is required. Accordingly, in order to perform the calculation with such a DSP as described above, two RAMs for storing the two variables must be used.
If it is assumed that the sample number n for short-term prediction in the equation (1) is 10 for the convenience of description, then A(z) and W(z) are represented by the following equations (3) and (4), respectively: EQU A(z)=1+a[1]z.sup.-1 +a[2]z.sup.-2 +. . . +a[10]z.sup.-10 (3) ##EQU2## where a[1] to a[10] are variables, and accordingly, also a[1].gamma.1.sup.1 to a[10].gamma.1.sup.10 and a[1].gamma.2.sup.1 to a[10].gamma.2.sup.10 are variables.
Where the perceptual weighted signal SW which is an output of the perceptual weighting filter is represented by y(n) and the input speech signal S is represented by x(n), the perceptual weighting filter W(z) is developed in the following manner: ##EQU3## The coefficients a[i].gamma.2.sup.i, y(n-i), a[j].gamma.1.sup.j and x(n-j) in the equation (5) are variables.
In an ordinary DSP which has one data point for a RAM, the number of processing steps, that is, the calculation time, is long because an operation for storing or saving variables into the RAM is required upon every calculation processed. In particular, multiplication of a RAM storage variable A and another RAM storage variable B, that is, A.times.B, requires totaling 6 steps including step 1 at which A is read into the data point, step 2 at which A is set to the multiplicand M and the address of A is updated, step 3 at which the address of A is saved temporarily, step 4 at which B is read into the data point, step 5 at which B is set to the multiplier N and the address of B is updated and step 6 at which M.times.N is executed and the address of B is saved temporarily.
In the conventional speech coding apparatus described above, when a perceptual weighting filter is realized, since the filter coefficient of the filter is variable, the filter must be a filter of the zero pole type for which multiplication between variables is required. Consequently, when calculation processing is performed by a DSP, two RAMs for storing the two variables corresponding to the individual data points are required. Thus, the conventional speech coding apparatus is disadvantageous in that it requires a comparatively great number of steps and hence a comparatively large calculation time because operations to store and save the variables into the RAMs are required each time calculation is performed for each of the data points.