1. Field of the Invention
The present invention relates to a system for speech coding and an apparatus for the same, more particularly relates to a system for high quality speech coding and an apparatus for the same using vector quantization for data compression of speech signals.
2. Description of the Related Art
In recent years, use has been made of vector quantization for maintaining the quality and compressing the data of speech signals in intra company communication systems, digital mobile radio systems, etc. The vector quantization system is a well known one in which predictive filtering is applied to the signal vectors of a code book to prepare reproduced signals and the error powers between the reproduced signals and an input speech signal are evaluated to determine the index of the signal vector with the smallest error. There is rising demand, however, for a more advanced method of vector quantization so as to further compress the speech data.
FIG. 1 shows an example of a system for high quality speech coding using vector quantization. This system is known as the code excited LPC (CELP) system. In this, a code book 10 is preset with 2.sup.m patterns of residual signal vectors produced using N samples of white noise signal which corresponds to an N dimensional vector (in this case, shape vectors showing the phase, hereinafter referred to simply as vectors). The vectors are normalized so that the power of N samples (N being, for example 40) becomes a fixed value.
Vectors read out from the code book 10 by the command of the evaluating circuit 16 are given a gain by a multiplier unit 11, then converted to reproduced signals through two adaptive prediction units, i.e., a pitch prediction unit 12 which eliminates the long term correlation of the speech signals and a linear prediction unit 13 which eliminates the short term correlation of the same.
The reproduced signals are compared with digital speech signals of the N samples input from a terminal 15 in a subtractor 14 and the errors are evaluated by the evaluating circuit 16.
The evaluating circuit 16 selects the vector of the code book 10 giving the smallest power of the error and determines the gain of the multiplier unit 11 and a pitch prediction coefficient of the pitch prediction unit 12.
Further, as shown in FIG. 2, the linear prediction unit 13 uses the linear prediction coefficient found from the current frame sample values by a linear prediction analysis unit 18 in a linear difference equation as filter tap coefficients. The pitch prediction unit 12 uses the pitch prediction coefficient and pitch frequency of the input speech signal found by a pitch prediction analysis unit 31 through a reverse linear prediction filter 30 as filter parameters.
The index of the optimum vector in the code book 10, the gain of the multiplier unit 11, and the parameters for constituting the prediction units (pitch frequency, pitch prediction coefficient, and linear prediction coefficient) are multiplexed by a multiplexer circuit 17 and become coded information.
The pitch period of the pitch prediction unit 12, is, for example, 40 to 167 samples, and each of the possible pitch periods is evaluated and the optimum period is chosen. Further, the transmission function of the linear prediction unit 13 is determined by linear predictive coding (LPC) analysis of the input speech signal. Finally, the evaluating circuit 16 searches through the code book 10 and determines the index giving the smallest error power between the input speech signal and residual signal. The index of the code book 10 which is determined, that is, the phase of the residual vector, the gain of the multiplier unit 11, that is, the amplitude of the residual vector, the frequency and coefficient of the pitch prediction unit 12, and the coefficients of the linear prediction unit 13 are transmitted multiplexed by the multiplexer circuit 17.
On the decoder side, a vector is read out from a code book 20 having the same construction as the code book 10, in accordance with the index, gain, and prediction unit parameters obtained by demultiplexing by the demultiplexer circuit 19 and is given a gain by a multiplier unit 21, then a reproduced speech signal is obtained by prediction by the prediction units 22 and 23.
In such a code excited linear prediction (CELP) system, as the means for producing the speech signal, use is made of the code book 10 comprised of white noise and the pitch prediction unit 12 for giving periodicity at the pitch frequencies, but the decision on the phase of the code book 10, the gain (amplitude) of the multiplier unit 11, and the pitch frequency (phase) and pitch prediction coefficient (amplitude) of the prediction unit 12 is made equivalently as shown in FIG. 3.
That is, the processing for reproducing the vector of the code book 10 by the pitch prediction unit and linear prediction units for identification of the input signal, considered in terms of the vectors, may be considered processing for the identification, by subtraction and evaluation by a subtractor 50, of a target vector X obtained by removing from the input signal S of one frame input from a terminal 40, by a subtractor 41, the effects of the previous frame S.sub.0 stored in a previous frame storage 42, with a vector X' obtained by adding by an adder 49 a code vector gC obtained by applying linear prediction to a vector selected from a code book 10 by a linear prediction unit 44 (corresponding to the linear prediction unit 13 of FIG. 1) and giving a gain g to the resultant vector C by a multiplier unit 45 and a pitch prediction vector bP obtained by applying linear prediction by a linear prediction unit 47 to a residual signal of the previous frame given a delay corresponding to a pitch frequency from a pitch frequency delay unit 46 (corresponding to the pitch frequency analyzed by the pitch prediction analysis unit 31 of FIG. 1) and giving a gain b (corresponding to the pitch prediction coefficient analyzed by the pitch prediction unit 31 of FIG. 1) to the resultant vector P.
When the phase C of the code vector and the phase P of the pitch prediction vector are given, the amplitude g of the code vector and the amplitude b of the pitch prediction vector which, as shown in FIG. 4, satisfy the condition that the value of the error power .vertline.E.vertline..sup.2 partially differentiated by b and g by the following equation (1) is 0 so as to give the minimum error signal power, that is, satisfy EQU .differential..vertline.E.vertline..sup.2 /.differential.b=0,.differential..vertline.E.vertline..sup.2 /.differential.g=0
may be found from the following equations (2) and (3) for all combinations of the phases (C,P) of the two vectors and thereby the set of the most optimal amplitudes and phases (g, b, C, P) sought: EQU .vertline.E.vertline..sup.2 =.vertline.X-bP-gC.vertline..sup.2( 1) EQU b=((C,C)(X,P)-(C,P)(X,C))/.DELTA. (2) EQU g=((P,P)(X,C)-(C,P)(X,P))/.DELTA. (3)
where
.DELTA.=(P,P)(C,C)-(C,P)(C,P)) and (,) indicates the scalar product of the vector.
Here, speech signals include voiced speech sounds and unvoiced speech sounds which are characterized in that the respective drive source signals (sound sources) are periodic pulses or white noise with no periodicity.
In the CELP system, explained above as a conventional system, pitch prediction and linear prediction were applied to the vectors of the code book comprised of white noise as a sound source and the pitch periodicity of the voiced speech sounds was created by the pitch prediction unit 12.
Therefore, while the characteristics were good when the sound source signal was a white noise-like unvoiced speech sound, the pitch periodicity generated by the pitch prediction unit was created by giving a delay to the past sound source series by pitch prediction analysis, and the past sound source series was series of white noise originally obtained by reading code vectors from a code book, therefore, it was difficult to create a pulse series corresponding to the sound source of a voiced speech sound. This was a problem in that in the transitional state from an unvoiced speech sound to a voiced speech sound, the effect of this was large and high frequency noise was included in the reproduced speech, resulting in a deterioration of the quality.