1. Field of the Invention
This invention relates to a speech coding system for coding a speech signal with high quality at a low bit rate, specifically, at about 8 to 4.8 kb/s.
2. Description of the Prior Art
Various methods of coding a speech signal at a low bit rate of about 8 to 4.8 kb/s are already known. Exemplary one of such conventional coding methods is CELP (Code Excited Linear Prediction), which is disclosed, for example, in M. R. Schroeder and B. S. Atal, "CODE-EXCITED LINEAR PREDICTION (CELP): HIGH-QUALITY SPEECH AT VERY LOW BIT RATES", Proc. ICASSP, pp.937-940, 1985 (reference 1). According to this method, on the transmission side, a spectrum parameter representing a spectrum characteristic of a speech signal is extracted from a speech signal for each frame (e.g., 20 ms). Each frame is divided into subframes of, for example, 5 ms, and a pitch parameter representing a long-term correlation (pitch correlation) is extracted from a past excitation signal for each subframe. Then, long-term prediction (pitch prediction) of the speech. signal of the subframe is performed using the pitch parameter. A noise signal is selected from within a codebook which consists of predetermined different noise signals prepared in advance such that the error power between the speech signal and a signal synthesized using the selected signal may be minimized while an optimal gain is calculated. An index representative of the selected noise signal and the gain are transmitted together with the spectrum parameter and the pitch parameter. Description of construction and operation on the reception side is omitted herein.
Also various long-term prediction methods are already known. An exemplary method of such conventional long-term prediction methods uses an adaptive codebook such that excitation signals in the past are displaced successively one by one sample distance so that a value of such displacement (integer delay) which minimizes the squared error and a galn corresponding to the delay are found. The long-term prediction method just described is disclosed, for example, in W. Kleijn et al., "An Efficient Stochastically Excited Linear Predictive Coding Algorithm for High Quality Low Bit Rate Transmission of Speech", Speech Communication, 7, pp.305-316, 1988 (reference 2). With the long-term prediction method, however, the pitch period of an actual speech signal is not an integer multiple of a sampling frequency, and particularly when the voice is high (when the pitch period is short) as uttered by a female speaker, if it is tried to represent the pitch period of, for example, 20.5 samples in an integer value, then the delay of 41 samples, which is twice the pitch period, is likely to be selected, which deteriorates the quality of the reconstructed speech significantly. This is one of the causes of deterioration of the sound quality of a female speech having a short pitch period.
In order to solve the problem, a method of representing a delay (pitch period) in a fractional value has been proposed and is disclosed, for example, in P. Kroon et al., "PITCH PREDICTORS WITH HIGH TEMPORAL RESOLUTION", Proc. ICASSP, pp.661-664, 1990 (reference 3). According to this method, a fractional delay is realized to improve the sound quality by oversampling or polyphase filtering an excitation signal.
The method by P. Kroon et al., however, has disadvantages in that a significantly increased amount of calculation is required since, when a delay is to be converted into a fractional value, if the interpolation ratio of 4 is employed, then the calculation amount for a fractional delay in an adaptive codebook become 4 times that for an integer delay.