1. Field of the Invention
This invention relates to a speech encoding apparatus and a speech encoding and decoding apparatus for compressing and encoding speech signals or audio signals into digital signals.
2. Description of the Related Art
FIG. 9 is a block diagram of a typical overall constitution of a conventional speech encoding and decoding apparatus which divides an input speech into spectrum envelope information and excitation signal information and encodes the excitation signal information by the frame. The apparatus of FIG. 9 is identical to what is disclosed in JP-A 64/40899.
In FIG. 9, reference numeral 1 stands for an encoder, 2 for a decoder, 3 for multiplex means, 4 for separation means, 5 for an input speech, 6 for a transmission line, and 7 for an output speech. The encoder 1 comprises linear prediction parameter analysis means 8, linear prediction parameter encoding means 9, an adaptive codebook 10, adaptive code search means 11, error signal generation means 12, a random codebook 13, random code search means 14 and excitation signal generation means 15. The decoder 2 is made up of linear prediction parameter decoding means 16, an adaptive codebook 17, adaptive code decoding means 18, a random codebook 19, random code decoding means 20, excitation signal generation means 21 and a synthesis filter 22.
Described below is how the conventional speech encoding and decoding apparatus divides an input speech into spectrum envelope information and excitation signal information and encodes the excitation signal information by the frame.
The encoder 1 first receives a digital speech signal sampled illustratively at 8 kHz as the input speech 5. The linear prediction parameter analysis means 8 analyzes the input speech 5 and extracts a linear prediction parameter which is the spectrum envelope information of the speech. The linear prediction parameter encoding means 9 then quantizes the extracted linear prediction parameter and outputs a code representing that parameter to the multiplex means 3. At the same time, the linear prediction parameter encoding means 9 outputs the quantized linear prediction parameter to the adaptive code search means 11, error signal generation means 12 and random code search means 14.
The excitation signal information is encoded as follows. The adaptive codebook 10 holds previously generated excitation signals that are input from the excitation signal generation means 15. Upon receipt of a delay parameter l from the adaptive code search means 11, the adaptive codebook 10 returns to the search means 11 an adaptive vector corresponding to the received delay parameter l, the vector length of the adaptive vector being equal to the frame length. The adaptive vector is made by extracting a signal of frame length, which is l-sample previous to the current frame. If the parameter l is shorter than the frame length, the adaptive vector is made by extracting a signal of vector length corresponding to the delay parameter l, which is l-sample previous to the current frame, and by outputting that signal repeatedly until the frame length is reached. FIG. 10(a) is a view of a typical adaptive vector in effect when the delay parameter l is equal to or longer than the frame length, and FIG. 10(b) is a view of a typical adaptive vector in effect when the delay parameter l is shorter than the frame length.
Suppose that the delay parameter l falls within a range of 20.ltoreq.l.ltoreq.128. On that assumption, the adaptive code search means 11 receives the adaptive vector from the adaptive codebook 10, accepts the quantized linear prediction parameter from the linear prediction parameter encoding means 9, and generates a synthesis vector by linear prediction with the received vector and parameter. The adaptive code search means 11 then obtains the perceptual weighted distortion of the synthesis vector with respect to the input speech vector extracted by the frame from the input speech 5. Evaluating the distortion through comparison, the adaptive code search means 11 acquires the delay parameter L and the adaptive gain .beta. conducive to the least distortion. The delay parameter L and a code representing the adaptive gain .beta. are output to the multiplex means 3. At the same time, the adaptive code search means 11 generates an adaptive excitation signal by multiplying the adaptive vector corresponding to the delay parameter L by the adaptive gain .beta., and outputs the generated adaptive excitation signal to the error signal generation means 12 and excitation signal generation means 15.
The error signal generation means 12 generates a synthesis vector by linear prediction with the adaptive excitation signal from the adaptive code search means 11 and the quantized linear prediction parameter from the linear prediction parameter encoding means 9. The error signal generation means 12 then obtains an error signal vector as the difference between the input speech vector extracted from the input speech by the frame on the one hand, and the synthesis vector generated as described on the other, and outputs the error signal vector to the random code search means 14.
The random codebook 13 holds illustratively as many as N random vectors generated from random noise. Given a random code i from the random code search means 14, the random codebook 13 outputs a random vector corresponding to the received code. The random code search means 14 receives any one of the N random vectors from the random codebook 13, admits the quantized linear prediction parameter from the linear prediction parameter encoding means 9, and generates a synthesis vector by linear prediction with the received vector and parameter. The random code search means 14 then obtains the perceptual weighted distortion of the synthesis vector with respect to the error signal vector from the error signal generation means 12. Evaluating the distortion through comparison, the random code search means 14 acquires the random code I and the random gain .gamma. conducive to the least distortion. The random code I and a code representing the random gain .gamma. are output to the multiplex means 3. At the same time, the random code search means 14 generates a random excitation signal by multiplying the random vector corresponding to the random code I by the random gain .gamma., and outputs the generated random excitation signal to the excitation signal generation means 15.
The excitation signal generation means 15 receives the adaptive excitation signal from the adaptive code search means 11, admits the random excitation signal from the random code search means 14, and adds the two signals to generate an excitation signal. The excitation signal thus generated is output to the adaptive codebook 10.
When the encoding process above is completed, the multiplex means 3 places onto the transmission line 6 the code representing the quantized linear prediction parameter, the delay parameter L, the random code I, and the codes denoting the excitation gains .beta. and .gamma..
The decoder 2 operates as follows. The separation means 4 first receives the output of the multiplex means 3. In turn, the separation means 4 outputs through a separating process the code of the linear prediction parameter to the linear prediction parameter decoding means 16, the delay parameter L and the code of the adaptive gain .beta. to the adaptive code decoding means 18, and the random code I and the code of the random gain .gamma. to the random code decoding means 20.
The linear prediction parameter decoding means 16 decodes the received code back to the linear prediction parameter and sends the parameter to the synthesis filter 22. The adaptive code decoding means 18 reads from the adaptive codebook 17 an adaptive vector corresponding to the delay parameter L, decodes the received code back to the adaptive gain .beta., and generates an adaptive excitation signal by multiplying the adaptive vector by the adaptive gain .beta.. The adaptive excitation signal thus generated is output to the excitation signal generation means 21. The random code decoding means 20 reads from the random codebook 19 a random vector corresponding to the random code I, decodes the received code back to the random gain .gamma., and generates a random excitation signal by multiplying the random vector by the random gain .gamma.. The random excitation signal thus generated is output to the excitation signal generation means 21.
The excitation signal generation means 21 receives the adaptive excitation signal from the adaptive code decoding means 18, admits the random excitation signal from the random code decoding means 20, and adds the two received signals to generate an excitation signal. The excitation signal thus generated is output to the adaptive codebook 17 and synthesis filter 22. The synthesis filter 22 generates an output speech 7 by linear prediction with the excitation signal from the excitation signal generation means 21 and the linear prediction parameter from the linear prediction parameter decoding means 16.
An improved version of the above-described conventional speech encoding and decoding apparatus, capable of providing the output speech of higher quality, is described by P. Kroon and B. S. Atal in "Pitch Predictors with High Temporal Resolution" (ICASSP '90, pp. 661-664, 1990).
The improved conventional speech encoding and decoding apparatus has a constitution which is a variation of what is shown in FIG. 9. In the improved constitution, the adaptive code search means 11 deals with the delay parameter not only of an integer but also of a fractional rational number. The adaptive codebooks 10 and 17 each generate an adaptive vector corresponding to the delay parameter of a fractional rational number by interpolation between the samples of the excitation signal generated in the previous frames, and output the adaptive vector thus generated. FIGS. 11(a) and 11(b) show examples of adaptive vectors generated when the delay parameter l is a fractional rational number. FIG. 11(a) is a view of a typical adaptive vector in effect when the delay parameter l is equal to or longer than the frame length, and FIG. 11(b) is a view of a typical adaptive vector in effect when the delay parameter l is shorter than the frame length.
Constituted as outlined, the above improved apparatus determines the delay parameter at a precision level higher than the sampling frequency of the input speech, and generates the adaptive vector accordingly. As such, the improved apparatus can generate output speech of higher quality than the apparatus of JP-A 64/40899.
Another conventional speech encoding and decoding apparatus is disclosed in JP-A 4/344699. FIG. 12 is a block diagram of a typical overall constitution of that disclosed conventional speech encoding and decoding apparatus.
In FIG. 12, those parts with their counterparts already shown in FIG. 9 are given the same reference numerals, and detailed descriptions of the parts are omitted where they are repetitive. In FIG. 12, reference numerals 23 and 24 denote random codebooks which are different from those in FIG. 9.
The encoding and decoding apparatus of the above constitution operates as follows. Suppose that the delay parameter l falls within the range of 20.ltoreq.l.ltoreq.128 as before. On that assumption, the adaptive code search means 11 in the encoder 1 receives the adaptive vector from the adaptive codebook 10, accepts the quantized linear prediction parameter from the linear prediction parameter encoding means 9, and generates a synthesis vector by linear prediction with the adaptive vector and the quantized linear prediction parameter. The adaptive code search means 11 then obtains the perceptual weighted distortion of the synthesis vector with respect to the input speech vector extracted by the frame from the input speech 5. Evaluating the distortion through comparison, the adaptive code search means 11 acquires the delay parameter L and the adaptive gain .beta. conducive to the least distortion. The delay parameter L and a code representing the adaptive gain .beta. are output to the multiplex means 3 and random codebook 23. At the same time, the adaptive code search means 11 generates an adaptive excitation signal by multiplying the adaptive vector corresponding to the delay parameter L by the adaptive gain .beta., and outputs the generated adaptive excitation signal to the error signal generation means 12 and excitation signal generation means 15.
The random codebook 23 holds illustratively as many as N random vectors generated from random noise. Given a random code i from the random code search means 14, the random codebook 23 generates a random vector corresponding to the received code, puts the generated vector corresponding to the delay parameter L into a periodical format, and outputs the periodical random vector thus prepared. FIG. 13(a) is a view of a typical random vector in the periodical format. If the delay parameter L is a fractional rational number, the random codebook 23 generates a random vector by interpolation between the samples of the random vector, and puts the vector thus generated into a periodical format, as shown in FIG. 13(b).
The random code search means 14 receives any one of the N random vectors in the periodical format from the random codebook 23, admits the quantized linear prediction parameter from the linear prediction parameter encoding means 9, and generates a synthesis vector by linear prediction with the received vector and parameter. The random code search means 14 then obtains the perceptual weighted distortion of the synthesis vector with respect to the error signal vector from the error signal generation means 12. Evaluating the distortion through comparison, the random code search means 14 acquires the random code I and the random gain .gamma. conducive to the least distortion. The random code I and a code representing the random gain .gamma. are output to the multiplex means 3. At the same time, the random code search means 14 generates a random excitation signal by multiplying the periodical random vector corresponding to the random code I by the random gain .gamma., and outputs the generated random excitation signal to the excitation signal generation means 15.
When the encoding process above is completed, the multiplex means 3 places onto the transmission line 6 the code representing the quantized linear prediction parameter, the delay parameter L, the random code I, and the codes denoting the excitation gains .beta. and .gamma..
The decoder 2 operates as follows. The separation means 4 first receives the output of the multiplex means 3. In turn, the separation means 4 outputs through a separating process the code of the linear prediction parameter to the linear prediction parameter decoding means 16, the delay parameter L and the code of the adaptive gain .beta. to the adaptive code decoding means 18 and random codebook 24, and the random code I and the code of the random gain .gamma. to the random code decoding means 20.
Like the random codebook 23 on the encoding side, the random codebook 24 holds as many as N random vectors. Given the random code I from the random code decoding means 20, the random codebook 23 generates a random vector corresponding to the received code I, puts the generated vector corresponding to the delay parameter L into a periodical format, and outputs the periodical random vector thus prepared to the random code decoding means 20.
The random code decoding means 20 decodes the code of the random gain .gamma. back to the random gain .gamma., and multiplies by the gain .gamma. the periodical random vector received from the random codebook 24 so as to generate a random excitation signal. The random excitation signal thus generated is output to the excitation signal generation means 21.
The excitation signal generation means 21 receives the adaptive excitation signal from the adaptive code decoding means 18, accepts the random excitation signal from the random code decoding means 20, and adds the two inputs to generate an excitation signal. The excitation signal thus prepared is output to the adaptive codebook 17 and synthesis filter 22. The synthesis filter 22 receives the excitation signal from the excitation signal generation means 21, accepts the linear prediction parameter from the linear prediction parameter decoding means 16, and outputs an output speech 7 by linear prediction with the two inputs.
In a code searching during the encoding process, the conventional speech encoding and decoding apparatus outlined above puts the adaptive vector or random vector corresponding to the delay parameter into a periodical format, so as to generate a vector of the frame length. A synthesis vector is generated by linear prediction with the vector thus prepared. The apparatus then obtains the distortion of the synthesis vector with respect to the input speech vector of the frame length. One disadvantage of this apparatus is that huge amounts of computations are needed for the code searching because of large quantities of operations involved with the linear predictive synthesis process.