In modern digital communication systems, speech coding devices and algorithms play a central role. By means of these speech coding devices and algorithms, a speech signal is compressed so that it can be transmitted over a digital communication channel using a low number of information bits per unit of time. As a result, the bandwidth requirements are reduced for the speech channel which, in turn, increases the capacity of, for example, a mobile telephone system.
In order to achieve higher capacity, speech coding algorithms that are able to encode speech with high quality at lower bit rates are needed. Recently, the demand for high quality and low bit rate has sometimes lead to an increase of the frame length used in the speech coding algorithms. The frame contains speech samples residing in the time interval that is currently being processed in order to calculate one set of speech parameters. The frame length is typically increased from 20 to 40 milliseconds.
As a consequence of the increase of the frame length, fast transitions of the speech signal cannot be tracked as accurately as before. For example, the linear spectral filter model that models the movements of the vocal tract, is generally assumed to be constant during one frame when speech is analyzed. However, for 40 millisecond frames, this assumption may not be true since the spectrum can change at a faster rate.
In many speech coders, the effect of the vocal tract is modeled by a linear filter, that is obtained by a linear predictive coding (LPC) analysis algorithm. Linear predictive coding is disclosed in "Digital Processing of Speech Signals," L. R. Rabiner and R. W. Schafer, Prentice Hall, Chapter 8, 1978, and is incorporated herein by reference. The LPC analysis algorithms operate on a frame of digitized samples of the speech signal, and produces a linear filter model describing the effect of the vocal tract on the speech signal. The parameters of the linear filter model are then quantized and transmitted to the decoder where they, together with other information, are used in order to reconstruct the speech signal. Most LPC analysis algorithms use a time invariant filter model in combination with a fast update of the filter parameters. The filter parameters are usually transmitted once per frame, typically 20 milliseconds long. When the updating rate of the LPC parameters is reduced by increasing the LPC analysis frame length above 20 ms, the response of the decoder is slowed down and the reconstructed speech sounds less clear. The accuracy of the estimated filter parameters is also reduced because of the time variation of the spectrum. Furthermore, the other parts of the speech coder are affected in a negative sense by the mismodeling of the spectral filter. Thus, conventional LPC analysis algorithms, that are based on linear time invariant filter models have difficulties with tracking formants in the speech when the analysis frame length is increased in order to reduce the bit rate of the speech coder. A further drawback occurs when very noisy speech is to be encoded. It may then be necessary to use long speech frames which contain many speech samples in order to obtain a sufficient accuracy of the parameters of the speech model. With a time invariant speech model, this may not be possible because of the formant tracking capabilities described above. This effect can be counteracted by making the linear filter model explicitly time variable.
Time variable spectral estimation algorithms can be constructed from various transform techniques which are disclosed in "The Wigner Distribution-A Tool for Time-Frequency Signal Analysis," T. A. C. G. Claasen and W. F. G. Mecklenbrauker, Philips J. Res, Vol. 35, pp. 217-250, 276-300, 372-389, 1980, and "Orthonormal Bases of Compactly Supported Wavelets," I. Daubechies, Comm. Pure. Appl. Math, Vol. 41, pp. 929-996, 1988, which are incorporated herein by reference. Those algorithms are, however, less suitable for speech coding since they do not possess the previously described linear filter structure. Thus, the algorithms are not directly interchangeable in existing speech coding schemes. Some time variability may also be obtained by using conventional time invariant algorithms in combination with so called forgetting factors, or equivalently, exponential windowing, which are described in "Design of Adaptive Algorithms for the Tracking of Time-Varying Systems," A. Benveniste, Int. J. Adaptive Control Signal Processing, Vol. 1, no. 1, pp. 3-29, 1987, which is incorporated herein by reference.
The known LPC analysis algorithms that are based upon explicitly time variant speech models use two or more parameters, i.e., bias and slope, to model one filter parameter in the lowest order time variable case. Such algorithms are described in "Time-dependent ARMA Modeling of Nonstationary Signals," Y. Grenier, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. ASSP-31, no. 4, pp. 899-911, 1983, which is incorporated herein by reference. A drawback with this approach is that the model order is increased, which leads to an increased computational complexity. The number of speech samples/free parameter decreases for fixed speech frame lengths, which means that estimation accuracy is reduced. Since interpolation between adjacent speech frames is not used, there is no coupling between the parameters in different speech frames. As a result, coding delays which extend beyond one speech frame cannot be utilized in order to improve the LPC parameters in the present speech frame. Furthermore, algorithms that do not utilize interpolation between adjacent frames, have no control of the parameter variation across frame borders. The result can be transients that may reduce speech quality.