A digital speech communication or storage system typically uses a speech encoder to produce a parsimonious representation of the speech signal. A corresponding decoder is used to generate an approximation to the speech signal from that representation. The combination of the encoder and decoder is known in the art as a speech codec. As will be apparent to a person skilled in the art, many segments of speech signals contain quasiperiodic waveforms. Accordingly, consecutive cycles of quasiperiodic waveforms can be considered, and processed, by a speech codec as data vectors that evolve slowly over time.
An important element of a speech codec is the way it exploits correlation between consecutive cycles of quasiperiodic waveforms. Frequently, correlation is exploited by transmitting a single cycle of the waveform, or of a filtered version of the waveform, only once every 20–30 ms, so that a portion of the data is missing in the received signal. In a typical decoder the missing data is determined by interpolating between samples of the transmitted cycles.
In general, the use of interpolation by a speech decoder to generate data between the transmitted cycles only produces an adequate approximation to the speech signal if the speech signal really is quasiperiodic, or, equivalently, if the vectors representing consecutive cycles of the waveform evolve sufficiently slowly. However, many segments of speech contain noisy signal components, and this results in comparatively rapid evolution of the waveform cycles. In order for waveform interpolation in an encoder to be useful for such signals, it is necessary to extract a sufficiently quasiperiodic component from the noisy signal in the encoder. This extracted component may be encoded by transmitting only selected cycles and decoded by interpolation in the manner described above. The remaining noisy component may also be encoded using other appropriate techniques and combined with the quasiperiodic component in the decoder.
Linear low pass filtering a sequence of vectors representing consecutive cycles of speech in the time dimension is well known in the speech coding literature. The difficulty with this approach is that in order to get good separation of the slowly and rapidly evolving components, the low pass filter frequency response must have a sharp roll-off. This requires a long impulse response, which necessitates an undesirably large filter delay.
A Kalman filter technique for estimating quasiperiodic signal components has been described by Gruber and Todtli (IEEE Trans Signal Processing, Vol. 42, No. 3, March 1994, pp 552–562). However, because this Kalman filter technique is based on a linear dynamic system model of a frequency domain representation of the signal, it is unnecessarily complex. It also assumes that the dynamic system model parameters (i.e. noise energy and the harmonic signal gain) are known. However, when considering speech coding, noise energy and the harmonic signal gain parameters are not known.
A technique for determining the system parameters required in a Kalman filter using an Expectation Maximisation algorithm has been described in a more general setting by Digalakis et al (IEEE Trans Speech and Audio Processing, Vol. 1, No. 4, October 1993, pp 431–442). However, the technique is iterative, and in the absence of good initial estimates may converge slowly. It may also produce a result that is not globally optimal. No prior art method is known for obtaining good initial estimates. Further, this method typically requires a significant amount of data, over which the unknown parameters are constant. In the context of speech coding, where the parameters change continuously, rapid estimation is essential, and therefore this method of applying the Expectation Maximization algorithm needs to be improved.
Stachurski (PhD Thesis, McGill University, Montreal Canada, 1997) proposed a technique for estimating quasiperiodic signal components of a speech signal. This method involves minimizing a weighted combination of estimated noise energy and a measure of rate of change in the quasiperiodic component. This method is highly complex and does not allow the rate of evolution of the quasiperiodic component to be specified independently. Nor does it allow for an independently varying gain for the quasiperiodic component.
In this specification, including the claims, the terms comprises, comprising or similar terms are intended to mean a non-exclusive inclusion, such that a method or apparatus that comprises a list of elements does not include those elements solely, but may well include other elements not listed.