First, an instance of a conventional portable telephone set is explained with reference to FIGS. 1 and 2.
This portable telephone set is adapted for performing transmission processing of coding the speech into a preset code in accordance with the CELP system and transmitting the resulting code, and for performing the receipt processing of receiving the code transmitted from other portable telephone sets and decoding the received code into speech. FIGS. 1 and 2 show a transmitter for performing transmission processing and a receiver for performing receipt processing, respectively.
In the transmitter, shown in FIG. 1, the speech uttered by a user is input to a microphone 1 where the speech is transformed into speech signals as electrical signals, which are routed to an A/D (analog/digital) converter 2. The A/D converter 2 samples the analog speech signals from the microphone 1 with, for example, the sampling frequency of 8 kHz, for A/D conversion to digital speech signals, and further quantizes the resulting digital signals with a preset number of bits to route the resulting quantized signals to an operating unit 3 and to an LPC (linear prediction coding) unit 4.
The LPC unit 4 performs LPC analysis of speech signals from the A/D converter 2, in terms of a frame corresponding to e.g., 160 samples as a unit, to find p-dimensional linear prediction coefficients α1, α2, . . . , αP. The LPC analysis unit 4 sends a vector, having these P-dimensional linear prediction coefficients αP, where P=1, 2, . . . , P, as components, to a vector quantizer 5, as a feature vector α of the speech.
The vector quantizer 5 holds a codebook, associating the code vector, having the linear prediction coefficients as components, with the code, and quantizes the feature vector α from the LPC analysis unit 4, based on this codebook, to send the code resulting from the vector quantization, sometimes referred to below as A code (A_code), to a code decision unit 15.
The vector quantizer 5 sends the linear prediction coefficients α1, α2, . . . , αP′, as components forming the code vector α′ corresponding to the A code, to a speech synthesis filter 6.
The speech synthesis filter 6 is e.g., a digital filter of the IIR (infinite impulse response) type, and executes speech synthesis, with the linear prediction coefficients αP′, where p=1, 2, . . . , P, from the vector quantizer 5 as tap coefficients of the IIR filter and with the residual signals e from an operating unit 14 as an input signal.
That is, in the LPC analysis, executed by the LPC unit 4, it is assumed that a one-dimensional linear combination represented by the equation (1):sn+α1sn−1+α2sn−2+ . . . +αpsn−p=en  (1)holds, where sn is the (sampled value of) the speech signal at the current time n and sn−1, sn−2, . . . , sn−p are past P sample values neighboring thereto, and the linear prediction coefficients αp, which will minimize the square error between the actual sample value sn and a value of linear prediction sn′ thereof in case the predicted value (linear prediction value) sn′ of the sampled value of the speech signal sn at the current time is linear-predicted from the n past sample values sn−1, sn−2, . . . , sn−P in accordance with the following equation (2):sn′=−(α1sn−1+α2sn−2+ . . . +αpsn−p)  (2)is found.
In the above equation (1), {en} ( . . . , en−1, en, en+1, . . . ) are reciprocally non-correlated probability variables with an average value equal to 0 and with a variance equal to a preset value of β2.
From the equation (1), the sample value sn may be represented by the following equation (3):sn=en−(α1sn−1+α2sn−2+ . . . +αpsn−p)  (3)This may be Z-transformed to give the following equation (4):S=E/(1+α1z−1+α2z−2+ . . . +αpz−P)  (4)where S and E denote Z-transforms of sn and en in the equation (3), respectively.
From the equations (1) and (2), en can be represented by the following equation (5):en=sn−sn′  (5)and is termed a residual signal between the real sample value sn and linear predicted value sn′ thereof.
Thus, the speech signal sn may be found from the equation (4), using the linear prediction coefficients αP as tap coefficients of the IIR filter and also using the residual signal en as an input signal to the IIR filter.
The speech synthesis filter 6 calculates the equation (4), using the linear prediction coefficients αp′ from the vector quantizer 5 as tap coefficients and also using the residual signal e from the operating unit 14 as an input signal, as described above, to find speech signals (synthesized speech signals) ss.
Meanwhile, since the speech synthesis filter 6 uses not the linear prediction coefficients αp, obtained as the result of the LPC by the LPC unit 4, but the linear prediction coefficients αp′ as a code vector corresponding to the code obtained by its vector quantization. So, the synthesized speech signal output by the speech synthesis filter 6 is not the same as the speech signal output by the A/D converter 2.
The synthesized sound signal ss, output by the speech synthesis filter 6, is sent to the operating unit 3, which subtracts the speech signal s, output from the A/D converter 2, from the synthesized speech signal ss from the speech synthesis filter 6, to send the resulting difference value to a square error operating unit 7. The square error operating unit 7 finds the square sum of the difference values from the operating unit 3 (square sum of the sample values of the k'th frame) to send the resulting square sum to a minimum square sum decision unit 8.
The minimum square sum decision unit 8 holds an L-code (L_code) as a code representing the lag, a G-code (G_code) as a code representing the gain and an I-code (I_code) as the code representing the codeword, in association with the square error output by the square error operating unit 7, and outputs the I-code, G-code and the L-code corresponding to the square error output from the square error operating unit 7. The L-code, G-code and the I-code are sent to an adaptive codebook storage unit 9, a gain decoder 10 and to an excitation codebook storage unit 11, respectively. The L-code, G-code and the I-code are also sent to a code decision unit 15.
The adaptive codebook storage unit 9 holds an adaptive codebook, which associates e.g., a 7-bit L-code with a preset delay time (lag), and delays the residual signal e supplied from the operating unit 14 by a delay time associated with the L-code supplied from the minimum square error decision unit 8 to output the resulting delayed signal to an operating unit 12.
Since the adaptive codebook storage unit 9 outputs the residual signal e with a delay corresponding to the L-code, the output signal may be said to be a signal close to a periodic signal having the delay time as a period. This signal mainly becomes a driving signal for generating a synthesized sound of the voiced sound in the speech synthesis employing linear prediction coefficients.
The gain decoder 10 holds a table which associates the G-code with the preset gains β and γ, and outputs gain values β and γ associated with the G-code supplied from the minimum square error decision unit 8. The gain values β and γ are supplied to the operating units 12 and 13.
An excitation codebook storage unit 11 holds an excitation codebook, which associates e.g., a 9-bit I-code with a preset excitation signal, and outputs the excitation signal, associated with the I-code output from the minimum square error decision unit 8, to the operating unit 13.
The excitation signal stored in the excitation codebook is a signal close e.g., to the white noise and becomes a driving signal mainly used for generating the synthesized sound of the unvoiced sound in the speech synthesis employing linear prediction coefficients.
The operating unit 12 multiplies an output signal of the adaptive codebook storage unit 9 with the gain value β output by the gain decoder 10 and routes a product value 1 to the operating unit 14. The operating unit 13 multiplies the output signal of the excitation codebook storage unit 11 with the gain value γ output by the gain decoder 10 to send the resulting product n to the operating unit 14. The operating unit 14 sums the product value 1 from the operating unit 12 with the product value n from the operating unit 13 to send the resulting sum as the residual signal e to the speech synthesis filter 6.
In the speech synthesis filter 6, the input signal, which is the residual signal e, supplied from the operating unit 14, is filtered by the IIR filter, having the linear prediction coefficients αp′ supplied from the vector quantizer 5 as tap coefficients, and the resulting synthesized signal is sent to the operating unit 3. In the operating unit 3 and the square error operating unit 7, operations similar to those described above are carried out and the resulting square errors are sent to the minimum square error decision unit 8.
The minimum square error decision unit 8 verifies whether or not the square error from the square error operating unit 7 has becomes smallest (locally minimum). If it is verified that the square error is not locally minimum, the minimum square error decision unit 8 outputs the L code, G code and the I code, corresponding to the square error, and subsequently repeats a similar sequence of operations.
If it is found that the square error has become smallest, the minimum square error decision unit 8 outputs a definite signal to the code decision unit 15. The code decision unit 15 is adapted for latching the A code, supplied from the vector quantizer 5, and for sequentially latching the L code, G code and the I code, sent from the minimum square error decision unit 8. On receipt of the definite signal from the minimum square error decision unit 8, the code decision unit 15 sends the A code, L code, G code and the I code, then latched, to a channel encoder 16. The channel encoder 16 then multiplexes the A code, L code, G code and the I code, sent from the code decision unit 15, to output the resulting multiplexed data as code data, which code data is transmitted over a transmission channel.
For simplicity in explanation, the A code, L code, G code and the I code are assumed to be found from frame to frame. It is however possible to divide e.g., one frame into four sub-frames and to find the L code, G code and the I code on the sub-frame basis.
It should be noted that, in FIG. 1, as in FIGS. 2, 11 and 12, explained later on, an array variable [k] is formed by affixing [k] to each variable. In the present specification, explanation on this k, representing the number of frames, is sometimes omitted.
The code data, sent from a transmitter of another portable telephone set, is received by a channel decoder 21 of a receiver shown in FIG. 2. The channel decoder 21 decodes the L code, G code, I code and the A code from the cod data to send the so separated respective codes to an adaptive codebook storage unit 22, a gain decoder 23, an excitation codebook storage unit 24 and to a filter coefficient decoder 25.
The adaptive codebook storage unit 22, gain decoder 23, excitation codebook storage unit 24 and the operating units 26 to 28 are configured similarly to the adaptive codebook storage unit 9, gain decoder 10, excitation codebook storage unit 11 and the operating units 12 to 14, respectively, and perform the processing similar to that explained with reference to FIG. 1 to decode the L code, G code and the I code into the residual signal e. This residual signal e is sent as an input signal to a speech synthesis filter 29.
A filter coefficient decoder 25 holds the same codebook as that stored in the vector quantizer 5 of FIG. 1 and decodes the A code to the linear prediction coefficient αp′ which is then routed to the speech synthesis filter 29.
The speech synthesis filter 29 is configured similarly to the speech synthesis filter 6 of FIG. 1, and solves the equation (4), with the linear prediction coefficient αp′ from the filter coefficient decoder 25 as a tap coefficient and with the residual signal e from the operating unit 28 as an input signal, to generate a synthesized speech signal when the square error has been found to be minimum by the minimum square error decision unit 8 of FIG. 1. This synthesized speech signal is sent to a D/A (digital/analog) converter 30. The D/A converter 30 D/A converts the synthesized speech signal from the speech synthesis filter 29 to send the resulting analog signal to a loudspeaker 31 as output.
The transmitter of the portable telephone set transmits an encoded version of the residual signal and the linear prediction coefficients, as filter data supplied to the speech synthesis filter 29 of the receiver, as described above. Thus, the receiver decodes the codes into the residual signal and the linear prediction coefficients. The so decoded residual signal and linear prediction coefficients are corrupted with errors, such as quantization errors. Thus, the so decoded residual signals and so decoded linear prediction coefficients, sometimes referred to below as decoded residual signals and decoded linear prediction coefficients, respectively, are not the same as the residual signal and linear prediction coefficients obtained on LPC analysis of the speech, so that the synthesized speech signals, output by the receiver's speech synthesis filter 29, are distorted and therefore are deteriorated in sound quality.