1. Field
The disclosed embodiments relate generally to wireless communications, and more specifically to the field of signal processing.
2. Background
Transmission of voice by digital techniques has become widespread, particularly in long distance and digital radio telephone applications. This, in turn, has created interest in determining the least amount of information that can be sent over a channel while maintaining the perceived quality of the reconstructed speech. If speech is transmitted by simply sampling and digitizing, a data rate on the order of sixty-four kilobits per second (kbps) is required to achieve a speech quality of conventional analog telephone. However, through the use of speech analysis, followed by the appropriate coding, transmission, and re-synthesis at the receiver, a significant reduction in the data rate can be achieved.
Devices that employ techniques to compress speech by extracting parameters that relate to a model of human speech generation are called speech coders. A speech coder divides the incoming speech signal into blocks of time, or analysis frames. Hereinafter, the terms xe2x80x9cframexe2x80x9d and xe2x80x9cpacketxe2x80x9d are inter-changeable. Speech coders typically comprise an encoder and a decoder, or a codec. The encoder analyzes the incoming speech frame to extract certain relevant gain and spectral parameters, and then quantizes the parameters into binary representation, i.e., to a set of bits or a binary data packet. The data packets are transmitted over the communication channel to a receiver and a decoder. The decoder processes the data packets, de-quantizes them to produce the parameters, and then re-synthesizes the frames using the de-quantized parameters.
The function of the speech coder is to compress the digitized speech signal into a low-bit-rate signal by removing all of the natural redundancies inherent in speech. The digital compression is achieved by representing the input speech frame with a set of parameters and employing quantization to represent the parameters with a set of bits. If the input speech frame has a number of bits Ni and the data packet produced by the speech coder has a number of bits No, the compression factor achieved by the speech coder is Cr=Ni/No. The challenge is to retain high voice quality of the decoded speech while achieving the target compression factor. The performance of a speech coder depends on (1) how well the speech model, or the combination of the analysis and synthesis process described above, performs, and (2) how well the parameter quantization process is performed at the target bit rate of No bits per frame. The goal of the speech model is thus to capture the essence of the speech signal, or the target voice quality, with a small set of parameters for each frame.
Speech coders may be implemented as time-domain coders, which attempt to capture the time-domain speech waveform by employing high time-resolution processing to encode small segments of speech (typically 5 millisecond (ms) sub-frames) at a time. For each sub-frame, a high-precision representative from a codebook space is found by means of various search algorithms known in the art. Alternatively, speech coders may be implemented as frequency-domain coders, which attempt to capture the short-term speech spectrum of the input speech frame with a set of parameters (analysis) and employ a corresponding synthesis process to recreate the speech waveform from the spectral parameters. The parameter quantizer preserves the parameters by representing them with stored representations of code vectors in accordance with known quantization techniques described in A. Gersho and R. M. Gray, Vector Quantization and Signal Compression (1992). Different types of speech within a given transmission system may be coded using different implementations of speech coders, and different transmission systems may implement coding of given speech types differently. Typically, voiced and unvoiced speech segments are captured at high bit rates, and background noise and silence segments are represented with modes working at a significantly lower rate. Speech coders used in CDMA digital cellular systems employ variable bit-rate (VBR) technology, in which one of four data rates is selected every 20 ms, depending on the speech activity and the local characteristics of the speech signal. The data rates include full rate, half rate, quarter rate, and eighth rate. Typically, transient speech segments are coded at full rate. Voiced speech segments are coded at half rate, while silence and background noise (inactive speech) are coded at eighth rate, in which conventionally, only the spectral parameters and the energy contour of the signal are quantized at the lower bit rate.
For coding at lower bit rates, various methods of spectral, or frequency-domain, coding of speech have been developed, in which the speech signal is analyzed as a time-varying evolution of spectra. See, e.g., R. J. McAulay and T. F. Quatieri, Sinusoidal Coding, in Speech Coding and Synthesis ch. 4 (W. B. Kleijn and K. K. Paliwal eds., 1995). In spectral coders, the objective is to model, or predict, the short-term speech spectrum of each input frame of speech with a set of spectral parameters, rather than to precisely mimic the time-varying speech waveform. The spectral parameters are then encoded and an output frame of speech is created with the decoded parameters. The resulting synthesized speech does not match the original input speech waveform, but offers similar perceived quality. Examples of frequency-domain coders that are well known in the art include multiband excitation coders (MBEs), sinusoidal transform coders (STCs), and harmonic coders (HCs). Such frequency-domain coders offer a high-quality parametric model having a compact set of parameters that can be accurately quantized with the low number of bits available at low bit rates.
The process of encoding speech involves representing the speech signal using a set of parameters such as pitch, signal power gain, spectral envelope, amplitude, and phase spectra, which are then coded for transmission. The parameters are coded for transmission by quantizing each parameter and converting the quantized parameter values into bit-streams. A parameter is quantized by looking for the closest approximating value of the parameter from a predetermined finite set of codebook values. Codebook entries may be either scalar or vector values. The indices of the codebook entries most closely approximating the parameter values are packetized for transmission. At a receiver, a decoder employs a simple lookup technique using the transmitted indices to recover the speech parameters from an identical codebook in order to synthesize the original speech signal.
The speech encoding process may produce a binary packet for transmission containing any possible permutation of codebook indices, including a packet containing all ones. In existing CDMA systems, packets containing all ones are reserved for null traffic channel data. Null traffic channel data is generated at the physical layer when no signaling message is being transmitted. Null traffic channel data serves to maintain the connectivity between a user terminal and a base station. A user terminal may comprise a cellular telephone for mobile subscribers, a cordless telephone, a paging device, a wireless local loop device, a personal digital assistant (PDA), an Internet telephony device, a component of a satellite communication systems, or any other component device of a communications system. As defined in EIA/TIA/IS-95, null traffic channel data is equivalent to an eighth-rate packet with all bits set to one. Packets containing null traffic channel data are typically declared as erasures by speech decoders. Speech encoders must not allow a permutation of codebook indices representing quantized speech parameters to generate an illegal packet containing all ones, which is reserved for null traffic channel data. If an eighth-rate packet happens to be all ones after quantization, the encoder generally modifies the packet by re-computing a new packet. The re-computation procedure is repeated until a non all-ones packet is generated. Modification, or re-computation of a packet usually results in a sub-optimally encoded packet. Any sub-optimally encoded packet reduces the coding efficiency of the system. Thus, there is a need for avoiding re-computation by reducing the probability that illegal packets containing all ones, or any other undesirable permutation, will be generated during the process of encoding speech.
Embodiments disclosed herein address the above-stated needs by reducing the likelihood of producing an illegal null traffic channel data packet containing all ones, or any other undesirable permutation, while encoding a signal. Accordingly, in one aspect, a method for determining bit stream representation of signal parameters quantized for encoded transmission includes analyzing a history of the frequency of codebook values selected for quantizing the signal parameters, and reordering the codebook entries to manipulate the contents of the bit stream. In another aspect, a speech coder for encoding speech includes a frequency history generator for creating a statistical history of the frequency at which each codebook entry in a codebook for a given parameter is selected during parameter quantization while encoding a speech signal, and a codebook reorderer for reordering the codebook to manipulate the probability of producing a predetermined packet format while encoding a speech signal.