In many audio applications it is desired to transfer or store digitally an audio signal for example a speech signal. Rather than attempting to sample and subsequently reproduce a speech signal directly, a vocoder is often employed which constructs a synthetic speech signal containing the key features of the audio signal, the synthetic signal being then decoded for reproduction.
A coding algorithm that has been proposed for use with a vocoder user a speech model called the Multi-Band Excitation (MBE) model, first proposed in the paper "Multi-Band Excitation Vocoder" by Griffin and Lim, IEEE Transactions on Acoustics, Speech and Signal Processing Volume 36 No. 8 August 1988 Page 1223. The MBE model divides the speech signal into a plurality of frames which are analyzed independently to produce a set of parameters modelling the speech signal at that frame, the parameters being subsequently encoded for transmission/storage. The speech signal in each frame is divided into a number of frequency bands and for each frequency band a decision is made whether that portion of the spectrum is voiced or unvoiced and then represented by either periodic energy, for a voiced decision or noise-like energy for an unvoiced decision. The speech signal in each frame is characterised, using the model, by information comprising the fundamental frequency of the speech signal in the frame, voiced/unvoiced decisions for the frequency bands and the corresponding amplitudes for the harmonics in each band. This information is then transformed and vector quantized to provide the encoder output. The output is decoded by reversing this procedure. A proposal for implementation of a vocoder using the multi-band excitation model may be found in the Inmarsat-M Voice Codec, Version 3, August 1991 SDM/M Mod. 1/Appendix 1 (Digital Voice System Inc.).
It is a problem for implementation of such a vocoder that the fundamental pitch period and the number of harmonics changes from frame to frame, since these features are functions of the talker. For example, male speech generally has a lower fundamental frequency, with more harmonic components whereas female speech has a higher fundamental frequency with fewer harmonics. This causes a variable-dimension vector quantization problem. One proposed solution to the problem is to truncate the speech signal by selecting only a predetermined number of harmonics. However, such an approach causes unacceptable speech degradation particularly when recognition of the speaker of the reconstructed speech signal is desired.
A proposal to alleviate this problem is the use of Non-Square Transform (NST) vector-quantization as proposed by Lupini and Cuperman in IEEE Signal Processing Letters, Volume 3, No. 1, January 1996 and Cuperman, Lupini and Bhattacharya in the paper "Spectral Excitation Coding of Speech at 2.4 kb/s" Proceedings, IEEE International Conference on Acoustics, Speech and Signal Processing Volume 1. With this approach, the NST transforms the varying number of spectral harmonic amplitudes to a fixed number of transform coefficients which are then vector-quantized.
It is a disadvantage of this proposal, however, that very high computational complexity is involved in the Non-Square Transform operation. This is because the transformation of the varying-dimension vectors into either fixed 30 or 40 dimension vectors of this proposal is highly computationally intensive and requires a large memory to store all the elements of the transform matrices. The recommended fixed dimensional vector requires a one stage quantization which is also computationally expensive. It is a further disadvantage of NST vector quantization that the technique introduces distortion in the speech signal which degrades the perceptual quality of reproduced speech when the size of the codebook of the vector quantizers is small.
In some applications it is desired to encode the speech at a low bit rate, for example 2.4 kbps or less. A speech signal encoded in this way requires less memory to store the signal digitally, thus keeping the cost of a device using the bit rate. However, the use of NST vector quantization with the consequent requirements of high computational power and memory together with the problem of distortion does not provide a feasible solution to the problem of low cost encoding and storage of speech at such low bit rates.
It is the object of the invention to provide a method of an apparatus for speech coding which alleviates at least one of the disadvantages of the prior art.