As a method of efficiently coding a voice signal, for example, a CELP (Code exited linear predictive coding) described in “Code-exited linear prediction: High quality speech at very low bit rates” by M. Schroeder and B. Atal (Proc. ICASSP, pp. 937-940, 1985) (Reference 1) is known. Further, “Improved speech quality and efficient vector quantization in SELP” by Klein et al., (Proc. ICASSP, pp. 155-158, 1988) (Reference 2) is known. In these prior arts, on a transmission side, a spectrum parameter representing the spectrum characteristics of a voice signal is extracted from a voice signal every frame (for example, 20 mS) by using linear prediction (LPC) analysis. The frame is further divided into sub-frames (for example, 5 mS). Parameters (a delay parameter corresponding to a pitch period and a gain parameter) in an adaptive code book every sub-frame on the basis of a past sound source signal, and pitch prediction of the voice signal of the sub-frame is performed by using the adaptive code book. For the sound source signal obtained by the pitch prediction, an appropriate sound source code vector is selected from a sound source code book (vector quantization code book) consisting of noise signals of predetermined types to calculate an appropriate gain, thereby quantizing a sound source signal. The selection of the sound source code vector is performed such that an error power between a signal synthesized by a selected noise signal and the residual signal is minimized. An index representing the type of the selected code vector, a gain, the spectrum parameter, and the parameter of the adaptive code book are combined to each other by a multiplexer unit to be transmitted.
However, in the prior arts described above, an enormous amount of operation is required to select an appropriate sound source code vector from the sound source code book. This is because, in the methods of References 1 and 2, a filtering operation or a convolution operation is temporarily performed to code vectors to select a sound source code vector, and the operation is repeated as many times as is equal to the number of code vectors stored in the code book. By way of example, it is assumed that the number of bits of the code book is B and that the number of dimensions of the code book is N. In this case, when a filter or impulse response length when the filtering operation or the convolution operation is represented by. K, as an amount of operation, (N·K·2·B·8000)/N is required per second. For example, when B=10, N=40, and K=10, the operation must be repeated 81,920,000 times per second. As a result, the remarkably enormous amount of operation is disadvantageously required.
As a method of reducing an amount of operation required to searching a sound source code book, for example, ACELP (Algebraic Code Exited Linear Prediction) is proposed. For this method, for example, “16 kbps wideband speech coding technique based on algebraic CELP” (Proc. ICASSP, pp. 13-16, 1991 by C. Laflamme et al., (Reference 3) can be referred to. According to the method of Reference 3, a sound source signal is represented by a plurality of pulses, and the positions of the pulses are represented by the predetermined numbers of bits and transmitted. Here, since the amplitude of each pulse is limited to +1.0 or −1.0, the amount of operation for searching for the pulse can be considerably reduced. In Reference 3, the amount of operation can be considerably reduced.
However, although preferable sound quality can be obtained at a bit rate of 8 kB/S or more, when a bit rate lower than the value, and when background noise is superposed on voice, the number of pulses is not sufficient, and the sound quality of a background noise component of coded voice is considerably degraded. More specifically, since the sound source signal is represented by a combination of a plurality of pulses, the pulses are concentrated around a pitch pulse which is a start point of the pitches in a vowel range of the voice. For this reason, the sound source signal can be efficiently represented by a small number of pulses. However, since pulses must be raised at random for a random signal such as background noise, it is difficult that the background noise can be preferably represented by a small number of pulses. When the bit rate is reduced to reduce the number of pulses, sound quality for the background noise sharply degraded.
It is, therefore, an object of the present invention to perform voice coding with a relatively small amount of operation, in particular, small degradation of sound quality for background noise even though a low bit rate is set.