The present invention relates to a sound encoding apparatus and method of encoding a sound signal with high quality at a low bit rate, and a sound decoding apparatus and method of decoding, with high quality, a sound signal encoded by the sound encoding apparatus and method.
For example, CELP (Code Excited Linear
Predictive Coding) described in M. Schroeder and B. Atal, “Code-excited linear prediction: High quality speech at very low bit rates (Proc. ICASSP, pp. 937-940, 1985) (to be referred to as reference 1 hereinafter) and Kleijn et al., “Improved speech quality and efficient vector quantization in SELP” (Proc. ICASSP, pp. 155-158, 1988) (to be referred to as reference 2 hereinafter) is known as a system for efficiently encoding a sound signal.
In this CELP, on the transmitting side, spectral parameters representing the spectral characteristics of a sound signal are extracted by using LPC (Linear Predictive Coding) analysis for each frame (e.g., 20 ms) of the sound signal.
Next, each frame is further divided into subframes (e.g., 5 ms). On the basis of a past sound source signal, parameters (a delay parameter and gain parameter corresponding to the pitch period) in an adaptive code book are extracted for each subframe, thereby performing pitch prediction for a sound signal of the subframe by the adaptive code book.
With respect to the sound source signal obtained by the pitch prediction, an optimum sound source code vector is selected from a sound source code book (vector quantization code book) containing predetermined types of noise signals, and an optimum gain is calculated, thereby quantizing the sound source signal. In the sound source code vector selection, a sound source code vector which minimizes an error electric power between a signal synthesized by the selected noise signal and a residual signal is selected.
After that, an index and gain indicating the type of the selected sound source code vector, the spectral parameters, and the parameters of the adaptive code book are multiplexed by a multiplexer and transmitted.
When an optimum sound source code vector is selected from the sound source code book in the conventional sound signal encoding system as described above, filtering or convolutional operation must be once performed for each code vector. Since this operation is repetitively performed by the number of code vectors stored in the code book, a large amount of calculations is necessary. For example, if the number of bits of the sound code book is B and the number of dimensions is N, letting K be the filter or impulse response length in the filtering or convolutional operation, an operation amount of N×K×2B×8000/N is necessary per sec. As an example, if B=10, N=40, and K=10, an extremely enormous operation amount of 81,920,000 times per sec is necessary.
Various methods have been proposed, therefore, as a method of reducing the amount of calculations required to search for a sound source code vector from the sound source code book. An ACELP (Argebraic Code Excited Linear Prediction) system described in C. Laflamme et al., “16 kbps wideband speech coding technique based on algebraic CELP” (Proc. ICASSP, pp. 13-16, 1991) (to be referred to as reference 3 hereinafter) is one of these methods.
In this ACELP system, a sound source signal is represented by a plurality of pulses, and the position of each pulse is transmitted as it is represented by a predetermined number of bits. Since the amplitude of each pulse is limited to +1.0 or −1.0, the amount of calculations for pulse search can be largely reduced.
In the conventional sound signal encoding systems as described above, high sound quality can be obtained for a sound signal having an encoding bit rate of 8 kb/s or more. However, if the encoding bit rate is less than 8 kb/s, the number of pulses per subframe becomes insufficient. Since this makes it difficult to express a sound source signal with satisfactory accuracy, the quality of the encoded sound deteriorates.