In order to enable an efficient transmission of audio signals, e.g. speech, from a transmitting end to a receiving end, it is well known in the art to divide the speech at the transmitting end into a spectral envelope and an excite signal. Spectral envelope and excite signal are then both quantised and transferred to the receiving end in corresponding bit streams.
A common technique for obtaining a representation of the short-term spectral envelope of speech is Linear Predictive Coefficients (LPC) filtering. The resulting LPCs themselves, however, lack robustness to quantisation noise, which can result in filter instability problems. Therefore, it has been proposed e.g. by F. Itakura in “Line spectrum representation of linear predictive coefficients of speech signals”, J. Acoust, Soc. Amer. Vol. 57, p.S35. April 1975, to convert the LPCs for transmission into other, more suitable parameters, the line spectral frequency (LSF) parameters. These LSF parameters, which are also referred to as line spectral pairs, are robust to quantisation noise and exhibit also other attractive features.
When extracting the LSF parameters from the linear prediction, sampling theory and decimation theory should be taken into account for the conversion of the signal from the time domain into the frequency domain.
The sampling theory states that if a time domain signal xa(t) has a band limited Fourier transform Xa(Ω), such that Xa(Ω)=0 for Ω≧2π*F , where F is a specific frequency, then this signal xa(t) can be uniquely reconstructed from equally spaced samples xa(nT), with −∞<n<∞ and with T being the spacing in time, if 1/T>2*F.
Decimation, on the other hand, is a theory that defines how it is possible to change from a higher sampling rate of a time-domain signal to a lower rate through dividing the current rate by a factor M, where M≧1, without producing spectral overlapping.
In classic vocoders, LSF vectors comprising values of different LSF parameters are extracted from the Linear Prediction Coefficient estimated over speech windowed using typically a window (such as Hamming) of size 160 to 240 samples at a specific rate, for instance in time intervals of 20, 10 or even 5 ms. From the decimation perspective, this is similar to decimating more frequently extracted LSF vectors, e.g. LSF vectors calculated every speech sample by shifting the centre of the LPC analysis window a sample at a time, to the required LSF vector rate, e.g. one of the rates mentioned above.