Conventional analog speech processing systems are being replaced by digital signal processing systems. In digital speech processing systems, analog speech signals are sampled, and samples are then encoded by a number of bits depending on the desired signal quality. For a toll-quality speech communication without special processing, the number of bits to represent speech signals are 64 Kbit/s which may be too high for some low rate speech communication systems.
Numerous efforts have been made to reduce the data rates required to encode the speech and obtain a high quality decoded speech at the receiving end of the system. Code-excited linear predictive (CELP) coding techniques, introduced in the article, "Code-Excited Linear Prediction: High-Quality Speech at Very Low Rates," by M. R. Schroeder and B. S. Atal, Proc. ICASSP-85, pages 937-940, 1985, has proven to be the most effective speech coding algorithm for the rates between 4 Kbit/s and 16 Kbit/s.
The CELP coding is a frame based algorithm that stores sampled input speech signals into a block of samples called the "frame" and process this frame of data based on analysis-by-synthesis search procedures for extracting parameters of fixed codebook and adaptive codebook, and linear predictive coding (LPC).
The CELP synthesizer produces synthesized speech by feeding the excitation sources from the fixed codebook and adaptive codebook to the LPC forrnant filter. The parameters of the formant filter are calculated through the linear predictive analysis whose concept is that any speech sample (over a finite interval of frame) can be approximated as a linear combination of past known speech samples. A unique set of predictor coefficients (LPC prediction coefficients) for the input speech can thus be determined by minimizing the sum of the squared differences between the input speech samples and the linearly predicted speech samples. The parameters (codebook index and codebook gain) of the fixed codebook and adaptive codebook are selected by minimizing the perceptually weighted mean squared errors between the input speech samples and the synthesized LPC filter output samples.
Once the speech parameters of fixed codebook, adaptive codebook, and LPC filter are calculated, these parameters are quantized and encoded by the encoder for the transmission to the receiver. The decoder in the receiver generates speech parameters for the CELP synthesizer to produce synthesized speech.
The first speech coding standard based on CELP algorithm is the U.S. Federal Standard FS1016 operating at 4.8 Kbit/s. In 1992, the CCITT (now ITU-T) adopted the low-delay CELP (LD-CELP) algorithm known as G.728. The voice quality of the CELP coder has been improved during the past several years by many researchers. In particular, excitation codebooks have been extensively studied and developed for the CELP coder.
A particular CELP algorithm called vector sum excited linear prediction (VSELP) is developed for North American TDMA digital cellular standard known as IS-54 and described in the article, "Vector Sum Excited Linear Prediction (VSELP) Speech Coding at 8 Kbit/s," by I. R. Gerson and M. Jansiuk, Proc. ICASSP-90, pages 461-464, 1990. The excitation codevectors for the VSELP are derived from two random codebooks to classify the characteristics of the LPC residual signals. Recently an excitation codevector generated from an algebraic codebook is used for the ITU-T 8 Kbit/s speech coding standard in the article, "Draft Recommendation G.729: Coding of Speech at 8 kbit/s using Conjugate-Structure Algebraic-Code-Exited Linear Prediction (CS-ACELP)," ITU-T, COM 15-152, 1995. The addition of the pitch synchronous innovation (PSI) described in the article, "Design of a pitch synchronous innovation CELP coder for mobile communications," by Meno et. al., IEEE J. Sel. Areas Commun., vol. 13, pages 31-41, January 1995, improves the perceptual voice quality. Yet the voice quality of the CELP coder operating between 4 Kbit/s and 16 Kbit/s is not transparent, or toll quality.
Mixed excitation has been applied to the CELP speech coder by Taniguchi et. al. in the article, "Principal Axis Extracting Vector Excitation Coding. High Quality Speech at 8 KB/S," Proc. ICASSP-90, pages 241-244, 1990. Implied pulse codevectors depending on the selected baseline codevectors are introduced to improve the codec performance. Some improvements in terms of subjective measurement and objective measurement are reported. The aforementioned models attempt to enhance the performance of the CELP coder by improving pitch harmonic structures in the synthesized speech. These models depend on the selected baseline codevector which may not be suitable for some female speech, whose residual signal is purely white. Recently, mixed excitations from the baseline codebook and implied codebook have been applied to the CELP model to improve pitch harmonic structures by Kwon et. al. in the article, "A High Quality BI-CELP Speech Coder at 8 Kbit/s and Below," Proc. ICASSP-97, pages 759-762, 1997 and proven the effectiveness of the BI-CELP model. In order to produce a high quality synthesized speech, codebook for the CELP coder is required to characterize the LPC residual spectrums of random noise source and energy concentrated pulse source and mixtures of both random noise source and pulse source because of the characteristics of speech itself and CELP speech coding model.
In addition to the above referenced techniques, various United States Patents address CELP techniques. U.S. Pat. No. 5,526,464, issued to Marmelstein, is directed to reducing the codebook search complexity for CELP. This is accomplished through use of multiple band-passed residual signals with corresponding codebooks, where the codebook size increases as frequency decreases.
U.S. Pat. No. 5,140,638, issued to Moulsley, is directed to a system which uses one-dimensional codebooks as compared to the usual two-dimensional codebooks. This technique is used in order to reduce computational complexity within the CELP.
U.S. Pat. No. 5,265,190, issued to Yip el al., is directed to a reduced computation complexity method for CELP. In particular, convolution and correlation operations used to poll the adaptive codebook vectors in a recursive calculation loop to select the optimal excitation vector from the adaptive codebook are separated in a particular way.
U.S. Pat. No. 5,519,806, issued to Nakamura, is directed to a system for search of codebook in which an excitation source is synthesized through linear coupling of at least two basis vectors. This technique reduces the computational complexity for computing cross correlations.
U.S. Pat. No. 5,485,581, issued to Miyano et al., is directed to a method to reduce computational complexity by correcting an autocorrelation of a synthesis signal synthesized from a codevector of the excitation codebook and the linear predictive parameter using an autocorrelation of a synthesis signal synthesized from a codevector of the adaptive codebook and the linear predictive parameter and a cross-correlation between the synthesis signal of the code-vector of the adaptive codebook and the synthesis signal of the codevector of the excitation codebook. The method subsequently searches the gain codebook using the corrected autocorrelation and a cross-correlation between a signal obtained by subtraction of the synthesis signal of the codevector of the adaptive codebook from the input speech signal and the synthesis signal of the codevector of the excitation codebook.
U.S. Pat. No. 5,371,853, issued to Kao et al., is directed to a method for CELP speech encoding with an organized, non-overlapping, algebraic codebook containing a predetermined number of vectors, uniformly distributed over a multi-dimensional sphere to generate a remaining speech residual. Short term speech information, long term speech information, and remaining speech residuals are combined to form a reproduction of the input speech.
U.S. Pat. No. 5,444,816, issued to Adoul et al., is directed to a method to improve the excitation codebook and search procedures of CELP. This is accomplished through use of a sparce algebraic code generator associated to a filter having a transfer function varying in time.
None of the prior art maintains satisfactory or toll-quality speech using a digital coding at low data rates with reduced computational complexity.