1. Field of the Invention
This invention is concerned with speech coding, and more particularly to systems in which a speech signal can be generated by feeding the output of an excitation source through a synthesis filter. The coding problem then becomes one of generating, from input speech, the necessary excitation and filter parameters. LPC (linear predictive coding) parameters for the filter can be derived using well-established techniques, and the present invention is concerned with the excitation source.
2. Description of Related Art
Systems in which a voiced/unvoiced decision on the input speech is made to switch between a noise source and a repetitive pulse source tend to give the speech output an unnatural quality, and it has been proposed to employ a single "multipulse" excitation source in which a sequence of pulses is generated, no prior assumptions being made as to the nature of the sequence. It is found that, with this method, only a few pulses (say 6 in a 10 ms frame) are sufficient for obtaining reasonable results. See B. S. Atal and J. R. Remde: "A New Model of LPC Excitation for producing Natural-sounding Speech at Low Bit Rates", Proc. IEEE ICASSP, Paris, pp.614, 1982.
Coding methods of this type offer considerable potential for low bit rate transmission--e.g. 9.6 to 4.8 Kbit/s.
The coder proposed by Atal and Remde operates in a "trial and error feedback loop" mode in an attempt to define an optimum excitation sequence which, when used as an input to an LPC synthesis filter, minimizes a weighted error function over a frame of speech. However, the unsolved problem of selecting an optimum excitation sequence is at present the main reason for the enormous complexity of the coder which limits its real time operation.
The excitation signal in multipulse LPC is approximated by a sequence of pulses located at non-uniformly spaced time intervals. It is the task of the analysis by synthesis process to define the optimum locations and amplitudes of the excitation pulses.
In operation, the input speech signal is divided into frames of samples, and a conventional analysis is performed to define the filter coefficients for each frame. It is then necessary to derive a suitable multipulse excitation sequence for each frame. The algorithm proposed by Atal and Remde forms a multipulse sequence which, when used to excite the LPC synthesis filter minimizes (that is, within the constraints imposed by the algorithm) a mean-squared weighted error derived from the difference between the synthesized and original speech. This is illustrated schematically in FIG. 1. The positions and amplitudes of the excitation pulses are encoded and transmitted together with the digitized values of the LPC filter coefficients. At the receiver, given the decoded values of the multipulse excitation and the prediction coefficients, the speech signal is recovered at the output of the LPC synthesis filter.
In FIG. 1 it is assumed that a frame consists of n speech samples, the input speech samples being s.sub.o . . . s.sub.n-1 and the synthesized samples s.sub.o ' . . . s.sub.n-1 ', which can be regarded as vectors s,s'. The excitation consists of pulses of amplitude a.sub.m which are, it is assumed, permitted to occur at any of the n possible time instants within the frame, but there are only a limited number of them (say k). Thus the excitation can be expressed as an n-dimensional vector a with components a.sub.o . . . a.sub.n-1, but only k of them are non-zero. The objective is to find the 2k unknowns (k amplitudes, k pulse positions) which minimize the error: EQU e.sup.2 =(s-s').sup.2 ( 1)
--ignoring the perceptual weighting, which serves simply to filter the error signal such that, in the final result, the residual error is concentrated in those parts of the speech band where it is least obtrusive.
The amount of computation required to do this is enormous and the procedure proposed by Atal and Remde was as follows:
(1) Find the amplitude and position of one pulse, alone, to give a minimum error. PA1 (2) Find the amplitude and position of a second pulse which, in combination with this first pulse, gives a minimum error; the positions and amplitudes of the pulse(s) previously found are fixed during this stage. PA1 (3) Repeat for further pulses. PA1 receiving speech samples; PA1 processing the speech samples to derive parameters representing a synthesis filter response; PA1 deriving, from the parameters and the speech samples, pulse position and amplitude information defining an excitation consisting, within each of successive time frames corresponding to a plurality of speech samples, of a pulse sequence containing a smaller plurality of pulses, the pulse amplitudes and positions being controlled so as to reduce an error signal obtained by comparing the speech samples with the response of the synthesis filter to the excitation; PA1 wherein the pulse position and amplitude information is derived by: PA1 (1) deriving an initial estimate of the positions and amplitudes of the pulses, and PA1 (2) carrying out an iterative adjustment process in which individual pulses are selected and their positions and amplitudes reassessed.
This procedure could be further refined by finally reoptimizing all the pulse amplitudes; or the amplitudes may be reoptimized prior to derivation of each new pulse.