1. Field of the Invention
The present invention relates generally to a speech signal encoder and more specifically to a speech signal encoder utilizing a CELP (code-excited linear predictive) coding scheme which has been found well suited for encoding a speech signal at a low bit rate ranging from 4 Kb/s to 8 Kb/s (for example) without deteriorating human auditory perception.
2. Description of the Related Art
Digital technology is rapidly introduced in recent years into a mobile or cordless radio telephone system. However, frequency spectrum available to a radio communications system is strictly limited and thus, it is vital to encode a speech signal at a bit rate as low as possible.
By way of example, a CELP coding technique for encoding a speech signal at a low bit rate ranging from 4 kb/s (kilo-bit per second) to 8 kb/s is disclosed in a paper entitled "Code-Excited Linear Prediction (CELP)C High-Quality Speech at Very Low Bit Rates" by M. R. Schroeder, et al., CH2118-8/85/0000-0937 $1.00, 1985 IEEE, pages 937-940 (referred to as Paper 1).
According to Paper 1, a speech signal is first partitioned into a plurality of frames (one frame=20 ms (for example)) and, a short-term prediction code indicating frequency characteristics is extracted from each frame. Subsequently, each frame is further divided into a plurality of subframes.
An optimal delay code is determined from each subframe using previously prepared delay codes and an adaptive code book. The above mentioned delay code indicates speech pitch correlation, while the adaptive code book stores past excitation signals. In more specific terms, the delay code is subjected to a predetermined amount of "testing", after which the past excitation signal is retarded by a delay corresponding to each delay code. Thus, an optimal code vector is extracted. The extracted optimal code vector is used to produce a synthesis signal which is in turn employed to calculate an error electric power (viz., distance) relative to the speech signal. Subsequently, an optimal delay code with the minimum distance is determined. Further, an adaptive code vector and its gain, both corresponding to the optimal delay code, are determined.
Following this, a synthesis signal is produced using excitation code vectors extracted from an excitation code book which previously stores a plurality of quantized codes (viz., noise signals). Thereafter, an excitation code vector and their gain thereof are determined whose distance exhibits the minimal value between the synthesis signal and the residual sinal which is obtained by long-term prediction.
Finally, the following indices are transmitted to a receiver. That is, one index represents both the adaptive code vector and the kind of the excitation code vector, while the other index demonstrates the gain of each excitation signal and the kind of spectral parameters.
Let us discuss in more detail how to search for the delay code of an adaptive code vector. An incoming speech signal xn! is weighted in terms of auditory perception and is subtracted from a past affecting signal. The resulting signal is denoted by zn!. Thereafter, a synthesis signal He.sub.d n! is calculated by allowing an adaptive code vector e.sub.d n!, corresponding to a delay code d, to drive a synthesis filter H. The synthesis filter H is constructed by spectral parameters which are determined using the short-term prediction, quantized and inverse quantized. Following this, the delay code d is determined which minimizes the following equation (1) indicating an error electric power (viz., distance) between zn! and He.sub.d n!. EQU Ed=.SIGMA.(zn!-g.sub.d .multidot.H.multidot.e.sub.d n!).sup.2( 1)
where .SIGMA. denotes a total sum of n from 0 to (Ns-1), Ns denotes a subframe's length, H denotes a matrix for realizing the synthesis filter, g.sub.d indicates the gain of the adaptive code vector e.sub.d. Throughout the instant disclosure, .SIGMA. denotes a total sum of n from 0 to (Ns-1).
Equation (1) can be rewritten as given below. EQU Ed=.SIGMA. zn!.sup.2 -Cd.sup.2 /Gd (2)
where Cd indicates correlation, and Gd indicates auto-cross-correlation. Cd and Gd are given by EQU Cd=.SIGMA.zn!.multidot.H.multidot.e.sub.d n! (3) EQU Gd=.SIGMA.(H.multidot.e.sub.d n!).sup.2 ( 4)
The expression e.sub.d n! indicates a vector corresponding to the excitation signal which has been determined by encoding the foregoing frames and which has been delayed by the amount of the delay code d. The above mentioned long-term predicting method for determining an optimal delay code using filtering is called an adaptive code book search using a closed loop processing.
With the CELP encoding, the auditory quality depends on the accuracy of the long-term prediction. One known approach to improving the accuracy of the long-term prediction is a decimal (radix) point delay for expanding a delay code from integer point to radix point. Such prior art is disclosed in a paper entitled "Pitch Predictors with High Temporal Resolution" by Peter Kroon, et al., CH2847-2/90/0000-0661, 1990 IEEE (referred to as Paper 2).
The decimal point delay is able to increase sound quality. However, this approach carries out the optimization within each subframe per se and thus, it is difficult to effectively comply with the changes of delayed values extending over a plurality of subframes (viz., pitch path). In other words, the pitch path is not sufficiently smoothed and occasionally induces occurrence of large gaps. It is known that gaps in a pitch path causes discontinuity or wave fluctuation in an encoded speech signal, which leads to degradation of speech quality.
In order to address the just mentioned problems, the following method has been proposed. A candidate of a delay code is determined with respect to each subframe using an open-loop processing for matching the speech signal itself. Subsequently, a pitch path is determined such that the delay value (viz., pitch) becomes smooth over the entirg frame. This known technique is disclosed in a paper entitled "Techniques for Improving the Performance of CELP-Type Speech Coders" by Ira A. Gerson, et al., IEEE Journal on Selected Areas in Communications, Vol. 10, No. 5, June 1992, pages 858-865 (referred to as Paper 3).
Paper 3 discloses processes for smoothing a pitch path using distances or correlations determined at each subframe. More specifically, all the subframes of each frame are sequentially subjected to the following steps (a)-(d) and finally a pitch path which changes smoothly is determined at step (e):
(a) A delay code of a first subframe is evaluated; PA1 (b) In connection with the evaluated delay code, a delay speech vector x.sup.d is produced by referring to an open-loop adaptive code-book which has stored previous speech signals or codes weighted with auditory perception; PA1 (c) A cross-correlation value &lt;x, x.sup.d &gt; and auto-correlation value &lt;x.sup.d, x.sup.d &gt; are calculated using an auditory perception weighted signal or a speech signal of the coded subframe; PA1 (d) Using the calculated correlation values, a distance E=&lt;x, x.sup.d &gt;.sup.2 /&lt;x.sup.8, x.sup.d &gt; is produced which represents an error energy between the speech signal and the delayed speech vector; PA1 (e) After all the subframes of one frame are processed using steps (a)-(d), a pitch path are smoothed using distances or correlations determined in terms of each subframe; and PA1 (f) Using the pitch path obtained step (e), an optimal delay code of each subframe is determined by way of a conventional closed-loop code-book search.
Thus, the delay value (pitch), represented by estimated delay codes, varies smoothly and results in good speech quality.
The open-loop search disclosed in Paper 3 is to search for an optimal delay code by matching previous and current speech signal vectors. However, in the case where a pitch difference is extracted from the previous and current speech signal vectors as disclosed in Paper 3, such technique suffers from the problem that a large estimation error tends to occur. This is because the above mentioned two vectors have different spectral components with each other.
On the other hand, the closed-loop adaptive codebook search, such as disclosed in Paper 1 or 2, is able to more correctly estimate delay codes. However, this prior art has encountered the difficulty that the pitch path is not estimated in that the previous excitation signals (viz., encoding results of the previous subframes) are inevitably required.
What is desired is to provide an improved technique wherein a pitch path which varies smooth can be estimated in long-term prediction in order to achieve good speech quality at low bit rates.