1. Field
The present invention relates to communication systems, and more particularly, to speech processing within communication systems.
2. Background
The field of wireless communications has many applications including, e.g., cordless telephones, paging, wireless local loops, personal digital assistants (PDAs), Internet telephony, and satellite communication systems. A particularly important application is cellular telephone systems for remote subscribers. As used herein, the term “cellular” system encompasses systems using either cellular or personal communications services (PCS) frequencies. Various over-the-air interfaces have been developed for such cellular telephone systems including, e.g., frequency division multiple access (FDMA), time division multiple access (TDMA), and code division multiple access (CDMA). In connection therewith, various domestic and international standards have been established including, e.g., Advanced Mobile Phone Service (AMPS), Global System for Mobile (GSM), and Interim Standard 95 (IS-95). IS-95 and its derivatives, IS-95A, IS-95B, ANSI J-STD-008 (often referred to collectively herein as IS-95), and proposed high-data-rate systems are promulgated by the Telecommunication Industry Association (TIA) and other well known standards bodies.
Cellular telephone systems configured in accordance with the use of the IS-95 standard employ CDMA signal processing techniques to provide highly efficient and robust cellular telephone service. Exemplary cellular telephone systems configured substantially in accordance with the use of the IS-95 standard are described in U.S. Pat. Nos. 5,103,459 and 4,901,307, which are assigned to the assignee of the present invention and incorporated by reference herein. An exemplary system utilizing CDMA techniques is the cdma2000 ITU-R Radio Transmission Technology (RTT) Candidate Submission (referred to herein as cdma2000), issued by the TIA. The standard for cdma2000 is given in the draft versions of IS-2000 and has been approved by the TIA. Another CDMA standard is the W-CDMA standard, as embodied in 3rd Generation Partnership Project “3GPP”, Document Nos. 3G TS 25.211, 3G TS 25.212, 3G TS 25.213, and 3G TS 25.214.
The telecommunication standards cited above are examples of only some of the various communications systems that can be implemented. With the proliferation of digital communication systems, the demand for efficient frequency usage is constant. One method for increasing the efficiency of a system is to transmit compressed signals. Devices that employ techniques to compress speech by extracting parameters that relate to a model of human speech generation are called speech coders. A speech coder divides the incoming speech signal into blocks of time, or analysis frames. Speech coders typically comprise an encoder and a decoder. The encoder analyzes the incoming speech frame to extract certain relevant parameters, and then quantizes the parameters into binary representation, i.e., to a set of bits or a binary data packet, that is placed in an output frame. The output frames are transmitted over the communication channel in transmission channel packets to a receiver and a decoder. The decoder processes the output frames, de-quantizes them to produce the parameters, and resynthesizes the speech frames using the de-quantized parameters.
The function of the speech coder is to compress the digitized speech signal into a low-bit-rate signal by removing all of the natural redundancies inherent in speech. The digital compression is achieved by representing the input speech frame with a set of parameters and employing quantization to represent the parameters with a set of bits. If the input speech frame has a number of bits Ni and the data packet produced by the speech coder has a number of bits No, then the compression factor achieved by the speech coder is Cr=Ni/No. The challenge is to retain high voice quality of the decoded speech while achieving the target compression factor. The performance of a speech coder depends on how well the speech model, or the combination of the analysis and synthesis process described above, performs, and how well the parameter quantization process is performed at the target bit rate of No bits per frame. The goal of the speech model is thus to capture the essence of the speech signal, or the target voice quality, with a small set of parameters for each frame.
Of the various classes of speech coder, the Code Excited Linear Predictive Coding (CELP), Stochastic Coding, or Vector Excited Speech Coding coders are of one class. An example of a coder of this particular class is described in Interim Standard 127 (IS-127), entitled, “Enhanced Variable Rate Coder” (EVRC). Another example of a coder of this particular class is described in pending draft proposal “Selectable Mode Vocoder Service Option for Wideband Spread Spectrum Communication Systems,” Document No. 3GPP2 C.P9001. The function of the vocoder is to compress the digitized speech signal into a low bit rate signal by removing all of the natural redundancies inherent in speech. In a CELP coder, redundancies are removed by means of a short-term formant (or LPC) filter. Once these redundancies are removed, the resulting residual signal can be modeled as white Gaussian noise, or a white periodic signal, which also must be coded. Hence, through the use of speech analysis, followed by the appropriate coding, transmission, and re-synthesis at the receiver, a significant reduction in the data rate can be achieved.
The coding parameters for a given frame of speech are determined by first determining the coefficients of a linear prediction coding (LPC) filter. The appropriate choice of coefficients will remove the short-term redundancies of the speech signal in the frame. Long-term periodic redundancies in the speech signal are removed by determining the pitch lag, L, and pitch gain, gp, of the signal. The combination of possible pitch lag values and pitch gain values is stored as vectors in an adaptive codebook. An excitation signal is then chosen from among a number of waveforms stored in an excitation waveform codebook. When the appropriate excitation signal is excited by a given pitch lag and pitch gain and is then input into the LPC filter, a close approximation to the original speech signal can be produced.
In general, the excitation waveform codebook can be stochastic or generated. A stochastic codebook is one where all the possible excitation waveforms are already generated and stored in memory. Selecting an excitation waveform encompasses a search and compare through the codebook of the stored waveforms for the “best” one. A generated codebook is one where each possible excitation waveform is generated and then compared to a performance criterion. The generated codebook can be more efficient than the stochastic codebook when the excitation waveform is sparse.
“Sparse” is a term of art indicating that only a few number of pulses are used to generate the excitation signal, rather than many. In a sparse codebook, excitation signals generally comprise a few pulses at designated positions in a “track.” The Algebraic CELP (ACELP) codebook is a sparse codebook that is used to reduce the complexity of codebook searches and to reduce the number of bits required to quantize the pulse positions. The actual structure of algebraic codebooks is well known in the art and is described in the paper “Fast CELP coding based on Algebraic Codes” by J. P. Adoul, et al., Proceedings of ICASSP Apr. 6-9, 1987. The use of algebraic codes is further disclosed in U.S. Pat. No. 5,444,816, entitled “Dynamic Codebook for Efficient Speech Coding Based on Algebraic Codes”, the disclosure of which is incorporated by reference.
Since a compressed speech transmission can be performed by transmitting LPC filter coefficients, an identification of the adaptive codebook vector, and an identification of the fixed codebook excitation vector, the use of a sparse codebook for the excitation vectors allows for the reallocation of saved bits to other payloads. For example, the allocated bits in an output frame for the excitation vectors can be reduced and the speech coder can then use the freed bits to reduce the granularity of the LPC coefficient quantizer.
However, even with the use of sparse codebooks, there is an ever-present need to reduce the number of bits required to convey the excitation signal information while still maintaining a high perceptual quality to the synthesized speech signal.