1. Field of the Invention
The present invention relates to a speech encoding and decoding apparatus for transmitting a speech signal after information compression processing has been applied.
Recently, a speech encoding and decoding apparatus for compressing speech information to data of about 4 to 16 kbps at a high efficiency has been demanded for in-house communication systems, digital mobile radio systems and speech storing systems.
2. Description of Related Art
As the first prior art structure of a speech prediction encoding apparatus, there is provided an adaptive prediction encoding apparatus for multiplexing the prediction parameters (vocal tract information) of a predictor and residual signal (excitation information) for transmission to the receiving station.
FIG. 1 is a block diagram of an encoder used in the speech encoding apparatus of the first prior art structure. Encoder 100, comprises linear prediction analysis unit 101, predictor 102, quantizer 103, multiplexing unit 104 and adders 105 and 106.
Linear prediction analysis unit 101 analyzes input speech signals and outputs prediction parameters, and predictor 102 predicts input signals using an output from adder 106 (described below) and prediction parameters from linear prediction analysis unit 101. Adder 105 outputs error data by computing the difference between an input speech signal and the predicted signal, quantizer 103 obtains a residual signal by quantizing the error data, and adder 106 adds the output from predictor 102 to that of quantizer 103, thereby enabling the output to be fed back to predictor 102. Multiplexing unit 104 multiplexes prediction parameters from linear prediction analysis unit 101 and a residual signal from quantizer 103 for transmission to a receiving station.
With such a structure, linear prediction analysis unit 101 performs a linear prediction analysis of an input signal at every predetermined frame period, thereby extracting prediction parameters as vocal tract information to which appropriate bits are assigned by an encoder (not shown). The prediction parameters are thus encoded and output to predictor 102 and multiplexing unit 104. Predictor 102 predicts an input signal based on the prediction parameters and an output from adder 106. Adder 105 computes the error data (the difference between the predicted information and the input signal), and quantizer 103 quantizes the error data, thereby assigning appropriate bits to the error data to provide a residual signal. This residual signal is output to multiplexing unit 104 as excitation information.
After that, the encoded prediction parameter and residual signal are multiplexed by multiplexing unit 104 and transmitted to a receiving station.
Adder 106 adds an input signal predicted by predictor 102 and a residual signal quantized by quantizer 103. An addition output is again input to predictor 102 and is used to predict the input signal together with the prediction parameters.
In this case, the number of bits assigned to prediction parameters for each frame is fixed at .alpha.-bits per frame and the number of bits assigned to the residual signal is fixed at .beta.-bits per frame. Therefore, the (.alpha.+.beta.) bits for each frame are transmitted to the receiving station. In this case, the transmission rate is, for example, 8 kbps.
FIG. 2 is a block diagram showing a second prior art structure of the speech encoding apparatus. This prior art structure is a Code Excited Linear Prediction (CELP) encoder which is known as a low bit rate speech encoder.
Principally, a CELP encoder, like the first prior art structure shown in FIG. 1, is an apparatus for encoding and transmitting linear prediction code parameters (LPC or prediction parameters) obtained from an LPC analysis and a residual signal. However, this CELP encoder represents a residual signal by using one of the residual patterns within a code book, thereby obtaining high efficiency encoding.
Details of CELP are disclosed in Atal, B. S., and Schroeder, M. R. "Stochastic Coding of Speech at Very Low bit Rate" Proc.ICASSP 84-1610 to 1613, 1984, and a summary of the CELP encoder will be explained as follows by referring to FIG. 2.
LPC analysis unit 201 performs a LPC analysis of an input signal, and quantizer 202 quantizes the analyzed LPC parameters to be supplied to predictor 203. Pitch period m, pitch coefficient Cp and gain G, which are not shown, are extracted from the input signal.
A residual waveform pattern (code vector) is sequentially read out from the code book 204 and its respective pattern is, at first, input to multiplier 205 and multiplied by gain G. Then, the output is input to a feed-back loop, namely, a long-term predictor comprising delay circuit 206, multiplier 207 and adder 208, to synthesize a residual signal. The delay value of delay circuit 206 is set at the same value as the pitch period. Multiplier 207 multiplies the output from delay circuit 206 by pitch coefficient Cp.
A synthesized residual signal output from adder 208 is input to a feed-back loop, namely, a short term prediction unit comprising predictor 203 and adder 209, and the predicted input signal is synthesized. The prediction parameters are LPC parameters from quantizer unit 202. The predicted input signal is subtracted from an input signal at subtracter 210 to provide an error signal. Weight function unit 211 applies weight to the error signal, taking into consideration the acoustic characteristics of humans. This is a correcting process to make the error to a human ear uniform as the influence of the error on the human ear is different depending on the frequency band.
The output of weight function unit 211 is input to error power evaluation unit 212 and an error power is evaluated in respective frames.
A white noise code book 204 has a plurality of samples of residual waveform patterns (code vectors), and the above series of processes is repeated with regard to all the samples. A residual waveform pattern whose error power within a frame is minimum is selected as a residual waveform pattern of the frame.
As described above, the index of the residual waveform pattern obtained for every frame as well as LPC parameters from quantizer 202, pitch period m, pitch coefficient Cp and gain G are transmitted to a receiving station (not shown). The receiving station forms a long-term predictor with transmitted pitch period m and pitch coefficient Cp as is similar to the above case, and the residual waveform pattern corresponding to a transmitted index is input to the long-term predictor, thereby reproducing a residual signal. Further, the transmitted LPC parameters form a short-term predictor as is similar to the above case, and the reproduced residual signal is input to the short-term predictor, thereby reproducing an input signal.
Respective dynamic characteristics of an excitation unit and a vocal tract unit in a sound producing structure of a human are different, and the respective data quantity to be transmitted at arbitrary points by the excitation unit and vocal tract unit are different.
However, with a conventional speech encoding apparatus as shown in FIGS. 1 or 2, excitation information and vocal tract information are transmitted at a fixed ratio of data quantity. The above speech characteristics are not utilized. Therefore, when the transmission rate is low, quantization becomes coarse, thereby increasing noise and making it difficult to maintain satisfactory speech quality.
The above problem is explained as follows with regard to the conventional examples shown in FIGS. 1 or 2.
In a speech signal there exists a period in which characteristics change abruptly and a period in which the state is constant, and the latter value of the prediction parameters do not change too much. Namely, there are cases where co-relationship between the prediction parameters (LPC parameters) in continuous frames is strong, and cases where they are not strong. Conventionally, prediction parameters (LPC parameters) are transmitted at a constant rate with regard to each frame. Consequently, the characteristics of the speech signals are not fully utilized. Therefore, the transmission data causes redundancies and the quality of the reproduced speech in the receiving station is not sufficient for the amount of transmission data.