1. Field
The present invention relates generally to communication systems, and more particularly, to speech processing within communication systems.
2. Background
The field of wireless communications has many applications including, e.g., cordless telephones, paging, wireless local loops, personal digital assistants (PDAs), Internet telephony, and satellite communication systems. A particularly important application is cellular telephone systems for mobile subscribers. As used herein, the term xe2x80x9ccellularxe2x80x9d system encompasses both cellular and personal communications services (PCS) frequencies. Various over-the-air interfaces have been developed for such cellular telephone systems including, e.g., frequency division multiple access (FDMA), time division multiple access (TDMA), and code division multiple access (CDMA). In connection therewith, various domestic and international standards have been established including, e.g., Advanced Mobile Phone Service (AMPS), Global System for Mobile (GSM), and Interim Standard 95 (IS-95). In particular, IS-95 and its derivatives, IS-95A, IS-95B, ANSI J-STD-008 (often referred to collectively herein as IS-95), and proposed high-data-rate systems for data, etc. are promulgated by the Telecommunication Industry Association (TIA) and other well known standards bodies.
Cellular telephone systems configured in accordance with the use of the IS-95 standard employ CDMA signal processing techniques to provide highly efficient and robust cellular telephone service. Exemplary cellular telephone systems configured substantially in accordance with the use of the IS-95 standard are described in U.S. Pat. Nos. 5,103,459 and 4,901,307which are assigned to the assignee of the present invention and incorporated by reference herein. An exemplary system utilizing CDMA techniques is the cdma2000 ITU-R Radio Transmission Technology (RTT) Candidate Submission (referred to herein as cdma2000), issued by the TIA. The standard for cdma2000 is given in the draft versions of IS-2000 and has been approved by the TIA. The cdma2000 proposal is compatible with IS-95 systems in many ways. Another CDMA standard is the W-CDMA standard, as embodied in 3rd Generation Partnership Project xe2x80x9c3GPPxe2x80x9d, Document Nos. 3G TS 25.211, 3G TS 25.212, 3G TS 25.213and 3G TS 25.214.
With the proliferation of digital communication systems, the demand for efficient frequency usage is constant. One method for increasing the efficiency of a system is to transmit compressed signals. In a regular landline telephone system, a sampling rate of 64 kilobits per second (kbps) is used to recreate the quality of an analog voice signal in a digital transmission. However, by using compression techniques that exploit the redundancies of a voice signal, the amount of information that is transmitted over-the-air can be reduced while still maintaining a high quality.
Typically, conversion of an analog voice signal to a digital signal is performed by an encoder and conversion of the digital signal back to a voice signal is performed by a decoder. In an exemplary CDMA system, a vocoder comprising both an encoding portion and a decoding portion is located within remote stations and base stations. An exemplary vocoder is described in U.S. Pat. No. 5,414,796, entitled xe2x80x9cVariable Rate Vocoder,xe2x80x9d assigned to the assignee of the present invention and incorporated by reference herein. In a vocoder, an encoding portion extracts parameters that relate to a model of human speech generation. A decoding portion re-synthesizes the speech using the parameters received over a transmission channel. The model is constantly changing to accurately model the time varying speech signal. Thus, the speech is divided into blocks of time, or analysis frames, during which the parameters are calculated. The parameters are then updated for each new frame. As used herein, the word xe2x80x9cdecoderxe2x80x9d refers to any device or any portion of a device that can be used to convert digital signals that have been received over a transmission medium. The word xe2x80x9cencoderxe2x80x9d refers to any device or any portion of a device that can be used to convert acoustic signals into digital signals. Hence, the embodiments described herein can be implemented with vocoders of CDMA systems, or alternatively, encoders and decoders of non-CDMA systems.
Of the various classes of speech coder, the Code Excited Linear Predictive Coding (CELP), Stochastic Coding, or Vector Excited Speech Coding coders are of one class. An example of a coding algorithm of this particular class is described in Interim Standard 127 (IS-127), entitled, xe2x80x9cEnhanced Variable Rate Coderxe2x80x9d (EVRC). Another example of a coder of this particular class is described in pending draft proposal xe2x80x9cSelectable Mode Vocoder Service Option for Wideband Spread Spectrum Communication Systems,xe2x80x9d Document No. 3GPP2 C.P9001. The function of the vocoder is to compress the digitized speech signal into a low bit rate signal by removing all of the natural redundancies inherent in speech. In a CELP coder, redundancies are removed by means of a short-term formant (or LPC) filter. Once these redundancies are removed, the resulting residual signal can be modeled as white Gaussian noise, or a white periodic signal, which also must be coded. Hence, through the use of speech analysis, followed by the appropriate coding, transmission, and re-synthesis at the receiver, a significant reduction in the data rate can be achieved.
The coding parameters for a given frame of speech are determined by first determining the coefficients of a linear prediction coding (LPC) filter. The appropriate choice of coefficients will remove the short-term redundancies of the speech signal in the frame. Long-term periodic redundancies in the speech signal are removed by determining the pitch lag, L, and pitch gain, gp, of the signal. The combination of possible pitch lag values and pitch gain values is stored as vectors in an adaptive codebook. An excitation signal is then chosen from among a number of waveforms stored in an excitation waveform codebook. When the appropriate excitation signal is excited by a given pitch lag and pitch gain and is then input into the LPC filter, a close approximation to the original speech signal can be produced. Thus, a compressed speech transmission can be performed by transmitting LPC filter coefficients, an identification of the adaptive codebook vector, and an identification of the fixed codebook excitation vector.
An effective excitation codebook structure is referred to as an algebraic codebook. The actual structure of algebraic codebooks is well known in the art and is described in the paper xe2x80x9cFast CELP coding based on Algebraic Codesxe2x80x9d by J. P. Adoul, et al., Proceeedings of ICASSP Apr. 6-9, 1987. The use of algebraic codes is further disclosed in U.S. Pat. No. 5,444,816 entitled xe2x80x9cDynamic Codebook for Efficient Speech Coding Based on Algebraic Codesxe2x80x9d, the disclosure of which is incorporated by references.
Due to the intensive computational and storage requirements of implementing codebook searches for optimal excitation vectors, there is a constant need to reduce the storage requirements involved in conducting a codebook search.
Novel methods and apparatus for implementing a fast code vector search in coders are presented. In one aspect, a method is presented for reducing the memory requirements needed to conduct a search for a vector in a codebook.
In another aspect, an apparatus for selecting an optimal pulse vector from a pulse vector codebook is presented, wherein the optimal pulse vector is used by a linear prediction coder to encode a residual waveform. The apparatus comprises: an impulse response generator for generating an impulse response vector; a cross-correlation element configured to determine a cross-correlation vector relating the impulse response vector to a plurality of target signal samples from a filter, wherein the cross-correlation vector is used to determine a plurality of pulse positions such that the insertion of the plurality of pulse positions into the cross-correlation vector provides a predetermined number of high cross-correlation values; a pulse codebook generator configured to receive an indication signal indicative of the plurality of pulse positions from the cross-correlation element, and to output a plurality of pulse vectors in response to the indication signal, wherein the plurality of pulse vectors is a subset of the pulse vector codebook; and an energy computation element for determining an autocorrelation sub-matrix based upon the subset of the pulse vector codebook, wherein the autocorrelation sub-matrix and the cross-correlation vector are used to select the optimal pulse vector from the codebook.
In another aspect, an apparatus for reducing the memory requirements of a codebook search is presented. The apparatus comprises: an impulse response generator for generating an impulse response signal; a cross-correlation element configured to determine a cross-correlation vector relating the impulse response signal to a target signal; a selection element configured to receive the cross-correlation vector, to use the cross-correlation vector to identify an optimal set of a pulse positions, and to generate an indication signal that carries the identification of the optimal set of pulse positions; a pulse codebook generator that is configured to receive the indication signal from the selection element and to generate a plurality of pulse vectors, wherein the plurality of pulse vectors are generated based upon the identification of the optimal set of pulse positions carried by indication signal; and an energy computation element for determining an autocorrelation sub-matrix based on the plurality of pulse vectors, wherein the autocorrelation sub-matrix is used instead of an autocorrelation matrix, thereby decreasing the memory requirement of the codebook search.
In another aspect, a method for selecting an optimal pulse vector from a codebook is presented. The method comprises: determining a cross-correlation vector between a target signal and an impulse response, wherein each component in the cross-correlation vector corresponds to a position in an analysis frame; determining a plurality of P positions that correspond to the P largest components of the cross-correlation vector; selecting a plurality of pulse vectors from the codebook to form a subcodebook, wherein each of the plurality of pulse vectors correspond to at least one of the plurality of P positions; determining an autocorrelation matrix based on the plurality of P pulse vectors; and selecting the optimal pulse vector from the plurality of P pulse vectors.
In another aspect, method for reducing the computational complexity of a codebook search is presented. The method comprises: determining an energy value matrix using a partial set of autocorrelation values; storing the energy value matrix; using the energy value matrix and a cross-correlation value from a plurality of cross-correlation values to determine a criterion value for each vector in a plurality of vectors, wherein each cross-correlation value describes a relationship between a target signal and a respective vector in the codebook; and selecting a vector as optimal if the vector has the highest criterion ratio value.