1. Field of the Invention
The present invention relates to a speech coding apparatus and speech decoding apparatus and, more particularly, to a speech coding apparatus for coding a speech signal at a low bit rate with high quality.
2. Description of the Prior Art
As a conventional method of coding a speech signal with high efficiency, CELP (Code Excited Linear Predictive Coding) is known, which is disclosed, for example, in M. Schroeder and B. Atal, “Code-excited linear prediction: High quality speech at low bit rates”, Proc. ICASSP, 1985, pp. 937–940 (reference 1) and Kleijn et al., “Improved speech quality and efficient vector quantization in SELP”, Proc. ICASSP, 1988, pp. 155–158 (reference 2).
In this CELP coding scheme, on the transmission side, spectrum parameters representing a spectrum characteristic of a speech signal are extracted from the speech signal for each frame (for example, 20 ms) using linear predictive coding (LPC) analysis. Each frame is divided into subframes (for example, of 5 ms), and for each subframe, parameters for an adaptive codebook (a delay parameter and a gain parameter corresponding to the pitch period) are extracted based on the sound source signal in the past and then the speech signal of the subframe is pitch predicted using the adaptive codebook.
With respect to the sound source signal obtained by the pitch prediction, an optimum sound source code vector is selected from a sound source codebook (vector quantization codebook) consisting of predetermined types of noise signals, and an optimum gain is calculated to quantize the sound source signal.
The selection of a sound source code vector is performed so as to minimize the error power between a signal synthesized based on the selected noise signal and the residue signal. Then, an index and a gain representing the kind of the selected code vector as well as the spectrum parameter and the parameters of the adaptive codebook are combined and transmitted by a multiplexer section. A description of the operation of the reception side will be omitted.
The conventional coding scheme described above is disadvantageous in that a large calculation amount is required to select an optimum sound source code vector from a sound source codebook.
This arises from the fact that, in the methods in references 1 and 2, in order to select a sound source code vector, filtering or convolution calculation is performed once for each code vector, and such calculation is repeated by a number of times equal to the number of code vectors stored in the codebook.
Assume that the number of bits of the codebook is B and the order is N. In this case, if the filter or impulse response length in filtering or convolution calculation is K, the calculation amount required is N×K×2B×8000 per second. As an example, if B=10, N=40 and k=10, 81,920,000 calculations are required per second. In this manner, the conventional coding scheme is disadvantageous in that it requires a very large calculation size.
Various methods which reduce the calculation amount required to search a sound source codebook have been proposed. One of the methods is an ACELP (Algebraic Code Excited Linear Prediction) method, which is disclosed, for example, in C. Laflamme et al., “16 kbps wideband speech coding technique based on algebraic CELP”, Proc. ICASSP, 1991, pp. 13–16 (reference 3).
According to the method disclosed in reference 3, a sound source signal is represented by a plurality of pulses and transmitted while the positions of the respective pulses are represented by predetermined numbers of bits. In this case, since the amplitude of each pulse is limited to +1.0 or −1.0, the calculation amount required to search pulses can be greatly reduced.
As described above, according to the method disclosed in reference 3, a great reduction in calculation amount can be attained.
Another problem is that at a bit rate less than 8 kb/s, especially when background noise is superimposed on speech, the background noise portion of the coded speech greatly deteriorates in sound quality, although the sound quality is good at 8 kb/s or higher.
Such a problem arises for the following reason. Since a sound source is represented by a combination of a plurality of pulses, pulses concentrate near a pitch pulse as the start point of a pitch in a vowel interval of speech. This signal can therefore be efficiently expressed by a small number of pulses. For a random signal like background noise, however, pulses must be randomly generated, and hence the background noise cannot be properly expressed by a small number of pulses. As a consequence, if the bit rate decreases, and the number of pulses decreases, the sound quality of background noise abruptly deteriorates.