1. Technical Field
The present invention relates generally to speech encoding and decoding in voice communication systems; and, more particularly, it relates to various techniques used with code-excited linear prediction coding to obtain high quality speech reproduction through a limited bit rate communication channel.
2. Related Art
Signal modeling and parameter estimation play significant roles in communicating voice information with limited bandwidth constraints. To model basic speech sounds, speech signals are sampled as a discrete waveform to be digitally processed. In one type of signal coding technique called LPC (linear predictive coding), the signal value at any particular time index is modeled as a linear function of previous values. A subsequent signal is thus linearly predictable according to an earlier value. As a result, efficient signal representations can be determined by estimating and applying certain prediction parameters to represent the signal.
Applying LPC techniques, a conventional source encoder operates on speech signals to extract modeling and parameter information for communication to a conventional source decoder via a communication channel. Once received, the decoder attempts to reconstruct a counterpart signal for playback that sounds to a human ear like the original speech.
A certain amount of communication channel bandwidth is required to communicate the modeling and parameter information to the decoder. In embodiments, for example where the channel bandwidth is shared and real-time reconstruction is necessary, a reduction in the required bandwidth proves beneficial. However, using conventional modeling techniques, the quality requirements in the reproduced speech limit the reduction of such bandwidth below certain levels.
In conventional coding systems employing long term preprocessing, a modified residual is produced as a new reference for current excitation. The goal is to produce a modified residual that better matches a coded pitch contour (or delay contour) than the original residual so that the LTP gain is higher. This is attempted in conventional systems by individually shifting the pitch pulses to match the pitch contour, requiring reliable endpoint detection of a segment to be shifted to maintain signal continuity. Using such an open loop approach with pulse shifting results in quality problems in speech reproduction.
Additionally, in using such and other conventional approaches, the amount of pitch lag information that must be transmitted is relatively large in view of the limitations often placed on the channel bit rate. For example, 8 bits might be required to encode pitch lag for a first subframe (of 5 ms duration) followed perhaps by 5 bits for pitch lag changes in a second subframe, resulting in a relatively large amount of bandwidth allocation, e.g., 1.3 kbps (kilobits per second), just for the pitch lag information.
Further limitations and disadvantages of conventional systems will become apparent to one of skill in the art after reviewing the remainder of the present application with reference to the drawings.
Various aspects of the present invention can be found in an embodiment of a speech encoder that uses long term preprocessing of a speech signal wherein the speech signal has a previous pitch lag and a current pitch lag. Therein, the speech encoder comprises an adaptive codebook and an encoder processing circuit coupled to the adaptive codebook. Using estimates of the previous pitch lag and the current pitch lag, the encoder processing circuit generates a pitch lag contour. The encoder processing circuit continuously warps the speech signal to the pitch lag contour.
Many possible variations and further aspects of such a speech encoder are possible. For example, the speech signal may comprise either a weighted speech signal or a residual signal. The pitch lag contour may comprise a linear segment bounded by the estimates of the previous pitch lag and the current pitch lag, and continuous warping may involve warping the speech signal from a first time region to a second time region. Additionally, for example, the encoder processing circuit may search for a best local delay using linear time weighting, and/or perform the estimation of the current pitch lag.
Further aspects of the present invention may be found in an alternate embodiment of a speech encoder that uses long term preprocessing of a speech signal having a pitch lag. As before, the speech encoder comprises an adaptive codebook and an encoder processing circuit coupled thereto. The encoder processing circuit estimates the pitch lag, and, based on such estimate, applies continuous warping of the speech signal.
Other variations and further aspects such as those mentioned previously also apply to this embodiment. For example, the speech signal might comprise a weighted speech signal or a residual signal. The encoder processing circuit may search for a best local delay using linear time weighting, or conduct continuous warping by translating the speech signal from a first time region to a second time region.