1. Field of the Invention
The present invention proposes novel techniques for modeling, quantization and error concealment of the components of a prototype waveform (PW) representation of the speech prediction residual signal, and more particularly to a means of characterizing the degree of periodicity of the signal, and its use in efficient representation of the spectral magnitudes and phases of the slowly evolving waveform (SEW) and rapidly evolving waveform (REW) components. Encoding of other components of the PW representation, such as the PW gain vector, the SEW magnitude and phase, REW gain, magnitude shape vector and phase are also discussed for completeness, but these are the subjects of separate inventions. These techniques are applicable to low bit rate speech coders operating in the range of 2-4 kbit/s. This invention pertains to the computation of a voicing measure as a measure of the degree of signal periodicity and its subsequent use in the quantization of SEW spectral magnitude and the modeling of the SEW and REW phase spectra.
2. Background and Description of Related Art
The present invention describes techniques for efficient encoding of the speech signal applicable to speech coders typically operating at bit rates in the range of 2-4 kbit/s. In particular, such techniques are applicable to a representation of the speech prediction error (residual) signal known as the prototype waveform (PW) representation, see, e.g., W. B. Klejin and J. Haagen, xe2x80x9cWaveform Interpolation for Coding and Synthesisxe2x80x9d, in Speech Coding and Synthesis, Edited by W. B. Klejin, K. K. Paliwal, Elsevier, 1995; W. B. Klejin, xe2x80x9cEncoding Speech Using Prototype Waveformsxe2x80x9d, IEEE Transactions on Speech and Audio Processing, Vol. 1, No. 4, 386-399, 1993. The prototype waveforms are a sequence of complex Fourier transforms evaluated at pitch harmonic frequencies, for pitch period wide segments of the residual, at a series of points along the time axis. Thus, the PW sequence contains information about the spectral characteristics of the residual signal as well as the temporal evolution of these characteristics. A high quality of speech can be achieved at low coding rates by efficiently quantizing the important aspects of the PW sequence. In PW based coders, the PW is separated into a shape component and a level component by computing the RMS (or gain) value of the PW and normalizing the PW to unity RMS value. The normalized PW is decomposed into a slowly evolving waveform (SEW) which contains the periodic component of the residual and a rapidly evolving waveform (REW) which contains the a periodic component of the residual. As the pitch frequency varies, the dimensions of the PW, SEW and REW vectors also vary, typically in the range 11-61.
This invention also proposes novel error concealment techniques for mitigating the effects of frame erasure or packet loss between the speech encoder and the speech decoder due to a degraded transmission medium.
W. B. Klejin and J. Haagen, xe2x80x9cWaveform Interpolation for Coding and Synthesisxe2x80x9d, in Speech Coding and Synthesis, Edited by W. B. Klejin, K. K. Paliwal, Elsevier, 1995; W. B. Klejin, xe2x80x9cEncoding Speech Using Prototype Waveformsxe2x80x9d, IEEE Transactions on Speech and Audio Processing, Vol. 1, No. 4, 386-399, 1993; and J. Hagen and W. B. Klejin, xe2x80x9cWaveform Interpolationxe2x80x9d, in Modern Methods of Speech Processing, Edited by R. P. Ramachandran and R. Mammone, Kluwer Academic Publishers, 1995, describe the prototype waveform interpolation (PWI) modeling approach. However, the quantization of the PWI model is not specified in detail. The proposed invention pertains to the quantization of the various components of the PWI. The quantization approaches proposed in our invention are novel methods and are not in any way based on or derived from the quantization approaches described in the prior art in W. B. Klejin and J. Haagen, xe2x80x9cWaveform Interpolation for Coding and Synthesisxe2x80x9d, in Speech Coding and Synthesis, Edited by W. B. Klejin, K. K. Paliwal, Elsevier, 1995; W. B. Klejin, xe2x80x9cEncoding Speech Using Prototype Waveformsxe2x80x9d, IEEE Transactions on Speech and Audio Processing, Vol. 1, No. 4, 386-399, 1993; and J. Hagen and W. B. Klejin, xe2x80x9cWaveform Interpolationxe2x80x9d, in Modem Methods of Speech Processing, Edited by R. P. Ramachandran and R. Mammone, Kluwer Academic Publishers, 1995. Additionally, W. B. Klejin, Y. Shoham, D. Sen and R. Hagen, xe2x80x9cA Low Complexity Waveform Interpolation Coderxe2x80x9d, IEEE International Conference on Acoustics, Speech and Signal Processing, 1996, and Y. Shoham, xe2x80x9cVery Low Complexity Interpolative Speech Coding at 1.2 to 2.4 kbpsxe2x80x9d, IEEE International Conference on Acoustics, Speech and Signal Processing, 1997, describe certain quantization schemes for prototype waveform encoding.
In the prior art of W. B. Klejin and J. Haagen, xe2x80x9cWaveform Interpolation for Coding and Synthesisxe2x80x9d, in Speech Coding and Synthesis, Edited by W. B. Klejin, K. K. Paliwal, Elsevier, 1995, and W. B. Klejin, Y. Shoham, D. Sen and R. Hagen, xe2x80x9cA Low Complexity Waveform Interpolation Coderxe2x80x9d, IEEE International Conference on Acoustics, Speech and Signal Processing, 1996, the PW gain vector is not quantized using a VQ designed by explicit population of steady state and transient codewords. This can result in poor performance during voicing onsets and other transitory events. The variable dimensionality of SEW and REW vectors is addressed by using fixed order analytical function approximations for the REW magnitude shape and by deriving the SEW magnitude approximately from the REW magnitude. The coefficients of the analytical function that provides the best fit to the vector are used to represent the vector for quantization. This approach suffers from three disadvantages: (i) A modeling error is now added to the quantization error, leading to a loss of performance, (ii) analytical function approximation for reasonable orders (5-10) deteriorates with increasing frequency, and (iii) if spectrally weighted distortion metrics are used during VQ, the complexity of these methods becomes formidable. In the prior art of W. B. Klejin and J. Haagen, xe2x80x9cWaveform Interpolation for Coding and Synthesisxe2x80x9d, in Speech Coding and Synthesis, Edited by W. B. Klejin, K. K. Paliwal, Elsevier, 1995; and Y. Shoham, xe2x80x9cVery Low Complexity Interpolative Speech Coding at 1.2 to 2.4 kbpsxe2x80x9d, IEEE International Conference on Acoustics, Speech and Signal Processing, 1997, only a predetermined low frequency sub-band (for e.g., 0-800 Hz band) of the SEW magnitude is encoded. This substantially reduces the dimension of the SEW vector, thereby permitting direct VQ. At the receiver, the remaining upper band is estimated using the REW magnitude spectrum. This method suffers from the disadvantage that if a significant amount of signal energy exists in the upper band, it is reproduced poorly, leading to poor speech quality. This condition can occur for a number of speech sounds, especially for unvoiced speech.
A number of prior techniques for encoding phase are in use in PWI based voice coders, e.g., W. B. Klejin and J. Haagen, xe2x80x9cWaveform Interpolation for Coding and Synthesisxe2x80x9d, in Speech Coding and Synthesis, Edited by W. B. Klejin, K. K. Paliwal, Elsevier, 1995; W. B. Klejin, xe2x80x9cEncoding Speech Using Prototype Waveformsxe2x80x9d, IEEE Transactions on Speech and Audio Processing, Vol. 1, No. 4, 386-399, 1993; W. B. Klejin, Y. Shoham, D. Sen and R. Hagen, xe2x80x9cA Low Complexity Waveform Interpolation Coderxe2x80x9d, IEEE International Conference on Acoustics, Speech and Signal Processing, 1996; J. Hagen and W. B. Klejin, xe2x80x9cWaveform Interpolationxe2x80x9d, in Modern Methods of Speech Processing, Edited by R. P. Ramachandran and R. Mammone, Kluwer Academic Publishers, 1995; Y. Shoham, xe2x80x9cVery Low Complexity Interpolative Speech Coding at 1.2 to 2.4 kbpsxe2x80x9d, IEEE International Conference on Acoustics, Speech and Signal Processing, 1997. In these prior art, the SEW phase vector is either a random phase (for unvoiced sounds) or is the phase of a fixed pitch cycle waveform (for voiced sounds). This binary characterization of the SEW phase is too simplistic. This method may work for a narrow range of speakers and for clean speech signals. However, this method becomes unsatisfactory as the range of speakers increases and for speech corrupted by background noise. Noisy speech requires varying degrees of randomness in the SEW phase.
In prior art of W. B. Klejin and J. Haagen, xe2x80x9cWaveform Interpolation for Coding and Synthesisxe2x80x9d, in Speech Coding and Synthesis, Edited by W. B. Klejin, K. K. Paliwal, Elsevier, 1995; W. B. Klejin, xe2x80x9cEncoding Speech Using Prototype Waveformsxe2x80x9d, IEEE Transactions on Speech and Audio Processing, Vol. 1, No. 4, 386-399, 1993; W. B. Klejin, Y. Shoham, D. Sen and R. Hagen, xe2x80x9cA Low Complexity Waveform Interpolation Coderxe2x80x9d, IEEE International Conference on Acoustics, Speech and Signal Processing, 1996, the REW quantization does not employ a normalization of the REW magnitude vectors, whereby the level and shape information are separated. Instead, the REW magnitude vectors are quantized directly. The separation of level and shape, as proposed in this invention, is advantageous, since it allows more accurate quantization of time varying REW level, which is of primary importance. Secondly, in the prior art cited above, REW magnitude quantization is based upon the use of analytical functions to overcome the problem of variable dimensionality. This approach suffers from three disadvantages as mentioned earlier: (i) A modeling error is now added to the quantization error, leading to a loss of performance, (ii) analytical function approximation for reasonable orders (5-10) deteriorates with increasing frequency, and (iii) if spectrally weighted distortion metrics are used during VQ, the complexity of these methods becomes formidable.
In the prior art of W. B. Klejin and J. Haagen, xe2x80x9cWaveform Interpolation for Coding and Synthesisxe2x80x9d, in Speech Coding and Synthesis, Edited by W. B. Klejin, K. K. Paliwal, Elsevier, 1995, and W. B. Klejin, Y. Shoham, D. Sen and R. Hagen, xe2x80x9cA Low Complexity Waveform Interpolation Coderxe2x80x9d, IEEE International Conference on Acoustics, Speech and Signal Processing, 1996, REW phase is obtained at the receiver using random phase models. Use of a random phase for REW results in reconstructed speech that is excessively rough. This is due to the fact that it is not consistent with the SEW-REW separation model that is employed at the encoder. Consequently, the random phase model results in a REW component that does not conform to certain basic characteristics of the REW at the encoder. As an example, it is possible for the random phase based REW to have a significant amount of energy below 25 Hz, which is not possible for the REW at the encoder. Further, the correlation between SEW and REW due to the overlapping separation filters cannot be directly created when a random phase model is employed.
None of the prior art related to PW speech coders address the issue of error concealment that can be applied to the PW model parameters.
This invention proposes novel techniques for the modeling, quantization and error concealment, applicable to the components of a PW based voice coder, i.e., the PW gain vector and the variable dimension SEW and REW complex vectors. The prototype waveform (PW) gain is vector quantized using a vector quantizer (VQ) that explicitly populates the codebook by representative steady state and transient vectors of PW gain. This approach is effective in tracking the abrupt variations in speech levels during onsets and other non-stationary events, while maintaining the accuracy of the speech level during stationary conditions. In case of a frame erasure, errors in the PW gain parameter are concealed by estimating the PW gain based on the PW gains of the two preceding error-free frames and gradually decaying this estimate over the duration of the current frame.
The rapidly evolving waveform (REW) and slowly evolving waveform (SEW) component vectors are converted to magnitude-phase formats for quantization. The variable dimension SEW magnitude vector is quantized using a hierarchical approach. A fixed dimension SEW mean vector is computed by a sub-band averaging of SEW magnitude spectrum. A SEW deviation vector is computed by subtracting the SEW mean from the SEW magnitude vector. The variable dimension SEW deviation vector is reduced to a fixed dimension subvector of size 10, based on a dynamic frequency selection approach. The SEW deviation subvector and SEW mean vector are vector quantized using a switched predictive VQ. At the decoder, the SEW deviation subvector and the SEW mean vector are combined to construct a full dimension SEW magnitude vector. This hierarchical approach to SEW magnitude quantization emphasizes the accurate representation of the average SEW magnitude level, which is perceptually important. Additionally, the average level gets refined at frequencies that are perceptually significant. In case of a frame erasure, errors in the SEW magnitude are concealed by estimating it using the preceding error-free SEW mean vector.
SEW phase information is represented implicitly using a measure of the degree of periodicity of the residual signal. This voicing measure is computed using a weighted root mean square (RMS) value of the SEW, a measure of the variance of SEW and the peak value of the normalized autocorrelation function of the residual signal and is quantized using 3 bits. At the decoder, the SEW phase is computed by a weighted combination of the previous SEW phase vector, a random phase perturbation and a fixed phase vector obtained from a voiced pitch pulse. The relative weights for these components are determined by the quantized voicing measure and the ratio of SEW and REW RMS values. The decoded SEW magnitude and SEW phase are combined to produce a complex SEW vector. The SEW component is passed through a low pass filter to reduce excessive variations and to be consistent with the SEW extraction process at the encoder. The SEW magnitude is preserved after the filtering operation. In case of a frame erasure, the voicing measure is estimated using a voice activity detector (VAD) output and the RMS value of the decoded SEW magnitude vector.
The REW magnitude vector sequence is normalized to unity RMS value, resulting in a REW magnitude shape vector and a REW gain vector. The normalized REW magnitude vectors are modeled by a multi-band sub-band model which converts the variable dimension REW magnitude shape vectors to a fixed dimension, e.g., to five dimensional REW sub-band vectors in the described embodiment. The sub-band vectors are averaged over time, resulting in a single average REW sub-band vector for each frame. At the decoder, the full-dimension REW magnitude shape vector is obtained from the REW sub-band vector by a piecewise-constant interpolation.
The REW gain vector is estimated using the quantized SEW mean vector. The resulting estimation error has a smaller variance and is efficiently vector quantized. A 5-bit vector quantization is used to encode the estimation error. In case of a frame erasure, the estimate provided by the SEW mean is used as the REW magnitude.
The REW phase vector is regenerated at the decoder based on the received REW gain vector and the voicing measure, which determines a weighted mixture of SEW component and a random noise that is passed through a high pass filter to generate the REW component. The weighting is adjusted so as to achieve the desired degree of correlation between the REW and the SEW components. The high pass filter poles are adjusted based on the voicing measure to control the REW component characteristics. At the output the filter, the magnitude of the REW component is scaled to match the received REW magnitude vector.
In addition to the error concealment techniques for the PW parameters, this invention also proposes error concealment and recovery techniques for the speech line spectral frequency (LSF) parameters and the pitch period parameter. In the case of a frame error, the LSF""s are constructed using the previous error-free LSF vector. During the error recovery process, the LSF""s are forced to change smoothly. In the case of pitch period, frame errors are concealed by repeating the preceding error-free pitch period value. Further, during error recovery, the pitch contour is forced to conform to certain smoothness conditions.
The invention uses a PW gain VQ design that explicitly populates a partitioned codebook using representative steady state and transient vectors of PW gain, e.g., 75% of the codebook is allocated to representing steady state vectors and the remaining 25% is allocated to representation of transient vectors. This approach allows better tracking of the variations of the residual power levels. This is particularly important at speech onsets during which the speech power levels can change by several orders of magnitude within a 20 ms frame. On the other hand, during steady state frames, the speech power level variation is significantly smaller. Other approaches, see, e.g., W. B. Klejin and J. Haagen, xe2x80x9cWaveform Interpolation for Coding and Synthesisxe2x80x9d, in Speech Coding and Synthesis, Edited by W. B. Klejin, K. K. Paliwal, Elsevier, 1995; W. B. Klejin, xe2x80x9cEncoding Speech Using Prototype Waveformsxe2x80x9d, IEEE Transactions on Speech and Audio Processing, Vol. 1, No. 4, 386-399, 1993; W. B. Klejin, Y. Shoham, D. Sen and R. Hagen, xe2x80x9cA Low Complexity Waveform Interpolation Coderxe2x80x9d, IEEE International Conference on Acoustics, Speech and Signal Processing, 1996; J. Hagen and W. B. Klejin, xe2x80x9cWaveform Interpolationxe2x80x9d, in Modern Methods of Speech Processing, Edited by R. P. Ramachandran and R. Mammone, Kluwer Academic Publishers, 1995; Y. Shoham, xe2x80x9cVery Low Complexity Interpolative Speech Coding at 1.2 to 2.4 kbpsxe2x80x9d, IEEE International Conference on Acoustics, Speech and Signal Processing, 1997, speech gain vectors are not quantized using such a partitioned VQ approach. Consequently, the codebook will be dominated by steady state vectors, and may lead to poor reproduction of speech levels during onsets.
The SEW vector determines the characteristics of the voiced segments of speech, and hence is perceptually important. It is quantized in magnitude-phase form. It is important to maintain the correct average level (across frequency) of the SEW magnitude vector. The variation about this average is of secondary importance compared to the average itself. Motivated by this consideration, the present invention uses a hierarchical approach to representing the SEW magnitude vector as the sum of a SEW mean vector and a SEW deviation vector. The SEW mean vector is obtained by a sub-band averaging process, resulting in a 5-dimensional vector. The SEW deviation vector is the difference between the SEW magnitude vector and the SEW mean vector. Compared to the SEW deviation vector, SEW mean vector is quantized more precisely and better protected against channel errors. This hierarchical decomposition into a mean component and a deviation component had the important advantage that the average SEW levels can be preserved better. This is very important in achieving a high-perceived quality of speech, especially during voiced segments. Prior techniques, see, e.g., W. B. Klejin and J. Haagen, xe2x80x9cWaveform Interpolation for Coding and Synthesisxe2x80x9d, in Speech Coding and Synthesis, Edited by W. B. Klejin, K. K. Paliwal, Elsevier, 1995; W. B. Klejin, xe2x80x9cEncoding Speech Using Prototype Waveformsxe2x80x9d, IEEE Transactions on Speech and Audio Processing, Vol. 1, No. 4, 386-399, 1993; W. B. Kiejin, Y. Shoham, D. Sen and R. Hagen, xe2x80x9cA Low Complexity Waveform Interpolation Coderxe2x80x9d, EEE International Conference on Acoustics, Speech and Signal Processing, 1996; J. Hagen and W. B. Klejin, xe2x80x9cWaveform Interpolationxe2x80x9d, in Modern Methods of Speech Processing, Edited by R. P. Ramachandran and R. Mammone, Kluwer Academic Publishers, 1995; Y. Shoham, xe2x80x9cVery Low Complexity Interpolative Speech Coding at 1.2 to 2.4 kbpsxe2x80x9d, IEEE International Conference on Acoustics, Speech and Signal Processing, 1997, have employed non-hierarchical approaches and are likely to result in lower performance and less robustness to channel errors.
The dimension of the REW and SEW vectors is a variable that depends upon the pitch frequency, and typically varies in the range 11-61. Existing VQ techniques, such as direct VQ, split VQ and multi-stage VQ are not well suited for variable dimension vectors. Adaptations of these techniques for variable dimension is neither practical from an implementation viewpoint nor satisfactory from a performance viewpoint. These are not practical since the worst case high dimensionality results in a high computational cost and a high storage cost. This usually leads to simplifications such as structured VQ, which result in a loss of performance, making such solutions unsatisfactory for encoding speech at bit rates in the range 2-4 kbit/s.
In a prior technique to address the variable dimensionality problem, W. B. Klejin, Y. Shoham, D. Sen and R. Hagen, xe2x80x9cA Low Complexity Waveform Interpolation Coderxe2x80x9d, IEEE International Conference on Acoustics, Speech and Signal Processing, 1996, analytical functions of a fixed order are used to approximate the variable dimension vectors. The coefficients of the analytical function that provides the best fit to the vectors are used to represent the vectors for quantization. The analytical function approximation is applied to the REW magnitude. The SEW magnitude is derived approximately from the REW magnitude in the 800 Hz-4000 Hz band. The SEW magnitude is explicitly coded only in the 0-800 Hz band. This approach suffers from three disadvantages: (i) A modeling error is now added to the quantization error, leading to a loss of performance, (ii) analytical function approximation for reasonable orders (5-10) deteriorates with increasing frequency, and (iii) if spectrally weighted distortion metrics are used during VQ, the complexity of these methods becomes formidable.
This invention proposes a novel solution to this problem which has a reasonable computation and storage cost, and at the same time provides a high level of performance. In this approach, the variable dimension SEW vector is decomposed into two fixed dimension vectors in a hierarchical manner, as the sum of a SEW mean vector and a SEW deviations vector. The SEW mean vector is obtained by a 5-band sub-band averaging and is represented by a 5-dimensional vector. The SEW deviations vector is reduced to a SEW deviation sub-vector of fixed dimension of 10 by selecting the 10 elements that are considered most important for speech quality. The set of selected frequencies varies with the spectral characteristics of speech, but is done in such a way that it needs no explicit transmission. In the absence of channel errors, the decoder can map the SEW deviation sub-vectors to the correct frequencies. The unselected elements of the SEW deviations are not encoded. The full-dimension SEW magnitude vector is reconstructed at the decoder by adding the quantized SEW mean and the SEW deviation components.
During voiced segments, the SEW magnitude vector exhibits a certain degree of interframe correlation. In order to exploit this property, the SEW mean vector is quantized using a switched predictive VQ. The SEW deviation sub-vector is quantized using a switched predictive gain-shape quantization. The predictor mode for SEW mean vector and the SEW deviations vector are jointly switched so as to minimize a spectrally weighted distortion between the reconstructed and the original SEW magnitude vectors. At the decoder, the SEW deviation sub-vector and the SEW mean vector are combined to produce the full dimension SEW magnitude vector.
Direct encoding of the SEW phase vector leads to unsatisfactory results when a small number of bits are employed. The present invention overcomes this problem by implicitly representing SEW phase using a measure of periodicity called the voicing measure. The voicing measure is computed using a weighted RMS value of the SEW, a measure of variability of SEW and the peak value of the normalized autocorrelation of the residual signal. The voicing measure is also useful in REW phase modeling. The voicing measure is quantized using 3 bits. At the decoder, the SEW phase is computed by a weighted combination of the previous SEW phase vector, a random phase perturbation and a fixed phase vector which corresponds to a voiced pitch pulse. The relative weights for these components are determined by the quantized voicing measure. The decoded SEW magnitude and SEW phase are combined to produce the complex SEW vector. The SEW component is filtered using a low pass filter to suppress excessively rapid variations that can appear due to the random component in SEW phase. The strength of the proposed technique is that it can realize various degrees of voicing in a frequency dependent manner. This results in more natural sounding speech with the right balance of periodicity and roughness both under quiet and noisy ambient conditions.
The REW magnitude vector sequence is normalized to unity RMS value, resulting in a REW magnitude shape vector and a REW gain vector. This separates the more important REW level information from the relatively less important REW shape information. Encoding of the REW gain vector serves to track the level of the REW magnitude vector as it varies across the frame. This is important to maintain the correct level of roughness as well as evolution bandwidth (temporal variation) of the random component in the reconstructed speech. The REW gain vector can be closely estimated using the encoded SEW mean vector. Consequently, REW gain is efficiently encoded by quantizing the REW gain estimation error with a small number of bits.
Prior techniques W. B. Klejin, Y. Shoham, D. Sen and R. Hagen, xe2x80x9cA Low Complexity Waveform Interpolation Coderxe2x80x9d, IEEE International Conference on Acoustics, Speech and Signal Processing, 1996; Y. Shoham, xe2x80x9cVery Low Complexity Interpolative Speech Coding at 1.2 to 2.4 kbpsxe2x80x9d, IEEE International Conference on Acoustics, Speech and Signal Processing, 1997 did not employ a gain-shape decomposition of REW magnitude or an estimation of REW gain vector using SEW level information. The separation of level and shape is advantageous, since it allows more accurate quantization of time varying REW level, which is of primary importance. Estimation using SEW level improves quantization accuracy. In prior techniques, the entire REW magnitude was modeled using analytical functions. This approach has serious shortcomings as mentioned earlier.
The normalized REW magnitude vectors are variable dimension vectors. To convert to a fixed dimension representation, these are modeled by a 6-band sub-band model resulting in 6 dimensional REW sub-band vectors. The REW sub-band vectors are averaged across the frame to obtain a single average REW sub-band vector for each frame. The average REW sub-band vector is vector quantized. At the decoder, the full-dimension REW magnitude shape vector is obtained from the REW sub-band vector by a piecewise-constant construction. Prior REW magnitude quantization is based upon the use of analytical functions to overcome the problem of variable dimensionality, W. B. Klejin, Y. Shoham, D. Sen and R. Hagen, xe2x80x9cA Low Complexity Waveform Interpolation Coderxe2x80x9d, IEEE International Conference on Acoustics, Speech and Signal Processing, 1996. This approach suffers from the disadvantages discussed earlier.
The REW phase vector is not explicitly encoded. At the receiver, the complex REW vector is derived using the received REW gain vector, received voicing measure and the received SEW vector. The complex REW component is derived by filtering a weighted sum of the complex SEW component and a white noise signal through a high pass filter. The weighting of SEW and white noise is dependent on the average REW gain value for that frame. The high pass filter is a single-zero, two-pole filter, whose real zero is adjusted based on SEW and REW levels. The complex pole frequency is fixed at 25 Hz (assuming a 50 Hz SEW sampling rate). The pole radius varies from 0.2 to 0.60, depending on the decoded voicing measure. As the periodicity of the frame increases (as indicated by a lower voicing measure), the pole moves closer to the unit circle. At the same time, at the filter input, the weight of the SEW component increases relative to that of the white noise component. This has the effect of creating a REW component having more correlation with SEW and with more of its energy at lower frequencies. At the same time, the presence of the zero at 0.9 ensures that the REW energy diminishes below 25 Hz. The overall result is to create a REW component that has its energy distributed in a manner roughly consistent with the REW extraction process at the encoder and with the relative levels of REW and SEW components.
In prior implementations of PWI coding W. B. Klejin, Y. Shoham, D. Sen and R. Hagen, xe2x80x9cA Low Complexity Waveform Interpolation Coderxe2x80x9d, IEEE International Conference on Acoustics, Speech and Signal Processing, 1996; Y. Shoham, xe2x80x9cVery Low Complexity Interpolative Speech Coding at 1.2 to 2.4 kbpsxe2x80x9d, IEEE International Conference on Acoustics, Speech and Signal Processing, 1997, REW phase was obtained at the receiver using random phase models. Use of a random phase for REW results in reconstructed speech that is excessively rough. This is due to the fact that it is not consistent with the SEW-REW separation model that is employed at the encoder. Consequently, the random phase model results in a REW component that does not conform to certain basic characteristics of the REW at the encoder. As an example, the random phase based REW is likely to have a significant amount of energy below 25 Hz, while the REW at encoder does not have a significant amount of energy below 25 Hz. Further, the correlation between SEW and REW due to the overlapping separation filters cannot be directly created when a random phase model is employed.