A source-filter model of speech is illustrated schematically in FIG. 1a. As shown, speech can be modelled as comprising a signal from a source 102 passed through a time-varying filter 104. For “voiced” speech, the source signal represents the immediate vibration of the vocal chords, and the filter represents the acoustic effect of the vocal tract formed by the shape of the throat, mouth and tongue. For “unvoiced” speech, the vocal chords are not utilized and the source becomes more of a noisy signal. The effect of the filter is to alter the frequency profile of the source signal so as to emphasise or diminish certain frequencies. Instead of trying to directly represent an actual waveform, speech encoding works by representing the speech using parameters of a source-filter model.
As illustrated schematically in FIG. 1b, the encoded signal will be divided into a plurality of frames 106, with each frame comprising a plurality of subframes 108. For example, speech may be sampled at 16 kHz and processed in frames of 20 ms, with some of the processing done in subframes of 5 ms (four subframes per frame). Each frame comprises a flag 107 by which it is classed according to its respective type. Each frame is thus classed at least as either “voiced” or “unvoiced”, and unvoiced frames are encoded differently than voiced frames. Each subframe 108 then comprises a set of parameters of the source-filter model representative of the sound of the speech in that subframe.
For voiced sounds (e.g. vowel sounds), the source signal has a degree of long-term periodicity corresponding to the perceived pitch of the voice. In that case, the source signal can be modelled as comprising a quasi-periodic signal with each period comprising a series of pulses of differing amplitudes. The source signal is said to be “quasi” periodic in that on a timescale of at least one subframe it can be taken to have a single, meaningful period which is approximately constant; but over many subframes or frames then the period and form of the signal may change. The approximated period at any given point may be referred to as the pitch lag. An example of a modelled source signal 202 is shown schematically in FIG. 2a with a gradually varying period P1, P2, P3, etc., each comprising four pulses which may vary gradually in form and amplitude from one period to the next.
According to many speech coding algorithms such as those using Linear Predictive Coding (LPC), a short-term filter is used to separate out the speech signal into two separate components: (i) a signal representative of the effect of the time-varying filter 104; and (ii) the remaining signal with the effect of the filter 104 removed, which is representative of the source signal. The signal representative of the effect of the filter 104 may be referred to as the spectral envelope signal, and typically comprises a series of sets of LPC parameters describing the spectral envelope at each stage. FIG. 2b shows a schematic example of a sequence of spectral envelopes 2041, 2042, 2043, etc. varying over time. Once the varying spectral envelope is removed, the remaining signal representative of the source alone may be referred to as the LPC residual signal, as shown schematically in FIG. 2a. 
The spectral envelope signal and the source signal are each encoded separately for transmission. In the illustrated example, each subframe 106 would contain: (i) a set of parameters representing the spectral envelope 204; and (ii) a set of parameters representing the pulses of the source signal 202.
In the illustrated example, each subframe 106 would comprise: (i) a quantised set of LPC parameters representing the spectral envelope, (ii)(a) a quantised LTP vector related to the correlation between pitch-periods in the source signal, and (ii)(b) a quantised LTP residual signal representative of the source signal with the effects of both the inter-period correlation and the spectral envelope removed.
Temporal fluctuations of spectral envelopes can cause perceptual degradation and a loss in coding efficiency. One way to mitigate these negative effects is to shorten the frame size, or frame skip, of the spectral analysis thereby lowering the fluctuations between the spectra. This approach unfortunately leads to a considerably higher transmit bit rate. However, it is desirable to reduce the transmit bit rate.
The coefficients generated by linear predictive coding are very sensitive to errors, and therefore a small error may distort the whole spectrum of the reconstructed signal, or may even result in the prediction filter becoming unstable. Therefore, the transmission of LPC coefficients is often avoided, and the LPC coefficients information is further encoded to provide a more robust parameter set.
To avoid these problems, it is common to represent the LPC coefficients as Line Spectral Pairs (LSP) also known as Line Spectral Frequencies (LSF), which are more robust to small errors introduced during transmission.
Due to the nature of LSFs, it is possible to interpolate between values for adjacent frames. This interpolation results in a smoothing of the signal, thereby reducing the effect of the temporal fluctuations of the spectral envelopes. Interpolation is performed using a fixed interpolation factor, typically having a value of 0.5. In the case for which the interpolation is taken fully into account in the estimation of which vector to transmit, the fixed interpolation factor may provide smoothing of the signal but may potentially lead to lower performance than without the interpolation.
It is an aim of some embodiments of the present invention to address, or at least mitigate, some of the above identified problems of the prior art.