For convenience of description the invention will be described with reference to a speech codec (coder-decoder) but the invention is also applicable to the coding and re-synthesising of other types of analogue signals, for example video. Digital techniques for the coding of speech are growing in popularity for a number of reasons, notably flexibility, cost and robustness to noise. One such technique is called Code Excited Linear Prediction (CELP) wherein the incoming speech signal is sampled, segmented into frames and encoded using a process which involves comparing it with sequences taken from a known codebook. The index number of the codebook sequence that provides the best match to each frame of the incoming speech is then stored or transmitted together with some gain and filter parameters. This type of coder belongs to the class of analysis by synthesis coders, so named since they synthesise a large number of possible matches for the signal to be coded and then use comparison techniques to analyse the incoming signal. The corresponding decoder or re-synthesiser will generally include a synthesis section similar to that of the coder.
"Fast CELP coding based on algebraic codes" by J-P. Adoul, P. Mabilleau, M. Delprat and S. Morissette, read at the International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1987, pages 1957-1960 discloses a simple CELP speech coding system which is described briefly here.
The output of a source of original speech is fed to a sampling and segmentation means which quantises the speech at an appropriate sampling rate such as 8 kHz and segments it into frames with a length of, for example, 5 ms. The output of the segmentation means comprises sampled, segmented speech which is fed to a non-inverting input of a summer and to a Linear Predictive Coder (LPC). The LPC derives a set of filter coefficients relating to the short term redundancy in the incoming speech signal.
A two-dimensional codebook contains K stochastic sequences of sampled white Gaussian noise, each of length N samples. The frames of sampled speech from the segmentation means also have a length of N samples. The codebook sequences are referred to as c.sup.k (n), where k is the codebook index and n is the particular sample number within a given sequence number k. The selected output sequence c.sup.k (n) is fed to a gain stage having a gain G which gain is derived mathematically for each block of the sampled speech and each codebook sequence. The output of the gain stage is filtered successively in a long term filter and a short term filter. The long term filter usually has only one tap and a relatively long delay that is usually greater than the length of the frames of sampled speech. The purpose of the long term filter is to impose some long term order upon the codebook sequence and since the frequency of this long term order is more often than not the pitch of the speech being synthesised, this filter is also referred to as the pitch predictor. The transfer function of the long term filter is 1/B(z) and the filter coefficient may be derived by an adaptive loop or by analysis of the incoming speech signal. The short term filter has much shorter delays but a much larger number of taps (typically 10 to 20) than the long term filter. The purpose of the short term filter is to impose some short term order upon the codebook sequence which results, in real speech, from the speaker's vocal tract and so this filter is often referred to as the vocal tract filter. The transfer function of this filter is 1/A(z) and the filter coefficients are supplied to the filter by the LPC. The output of the short term filter is a synthetic speech signal which is fed to an inverting input of the summer. The output of the summer is an error signal formed by the difference between the input speech segment and the filtered codebook sequence currently under test. The error signal is fed to a perceptual weighting filter to weight the error signal in relation to the way that a human ear perceives a speech signal, so that errors in parts of the frequency spectrum to which the human hearing process is more sensitive are de-emphasised by the coder. The output of the perceptual weighting filter is fed to a Mean Square Error (MSE) calculating means to produce a MSE output signal. The MSE means squares the perceptually weighted error for each sample within a frame of speech and sums the squared errors over the length of a frame. All K sequences from the codebook are filtered and compared with the frame of incoming speech and the MSE means maintains a record of the codebook sequence and the corresponding gain parameter that result in the lowest mean square error for each complete frame of incoming speech. The index of the optimum codebook sequence c.sup.k (n), the gain of the gain stage and the filter coefficients represent a synthetic speech signal that can be reconstructed using a corresponding re-synthesis system. Where these parameters are to be transmitted, the short term filter coefficients are often encoded as Log Area Ratios (LARs) or Line Spectrum Frequencies (LSFs) to make them less sensitive to bit errors caused by noise or interference in the channel.
In a corresponding decoder, or re-synthesis system, the optimum codebook sequence c.sup.k (n) is selected from a codebook and fed to a gain stage which is also fed with a gain parameter. The output of this gain stage is fed to a long term inverse filter which is also fed with the appropriate coefficients. The output of the long term inverse filter is fed to a short term inverse filter which is also fed with appropriate coefficients. The output of this short term filter is fed to an optional post filter which may be included to reduce the effects of quantisation noise. The output of the post filter is fed to a loudspeaker or to a loudspeaker and amplifier in combination to reproduce the synthetic speech.
One disadvantage of the CELP coding system described above is that the exhaustive search of all the sequences in the codebook and the double filtering of every sequence prior to the error comparison is very computationally intense. A typical codebook contains 1024 sequences each of length 40 samples and so the basic CELP scheme described above is not economically feasible to implement in real time.
One proposal to reduce the computational load of a CELP coding system is disclosed in United Kingdom Patent Specification 2 235 354A (PHB 33579) "Speech Coding System and a Method of Encoding Speech".
A one dimensional master codebook containing one long stochastic sequence is used from which sequences are fed to a short term filter to produce a two dimensional filtered codebook. The sequences from the master codebook overlap by a fixed amount and consequently some of the filtering required for each sequence overlaps with that required for the previous and following sequence or sequences. This can reduce the complexity of the codebook sequence filtering considerably. With the maxim overlap between successive sequences in the master codebook of all but one sample, a master codebook of length (K+N-1) will be required, where K is the number of sequences and N is the length of those sequences. An overlap between successive sequences of all but two samples gives better results and will require a codebook of length (2K+N-2).
Despite the fact that using a one dimensional codebook allows a reduction in the complexity of the filtering required by a CELP analogue signal coding system, the number of comparisons to be made between the filtered incoming speech and the filtered codebook sequences is still considerable.
It is an aim of the present invention to reduce the computation required for each incoming block of analogue signals in an analysis by synthesis analogue signal coding system.