The invention relates to a digital speech coder comprising a transmitter and a receiver for transmitting segmented digital speech signals, the transmitter comprising:
a first LPC-analyser for generating, in response to the digital speech signal of each segment, first prediction parameters which characterize the envelope of the segment-term spectrum of this digital speech signal, PA1 a first adaptive inverse filter for generating, in response to the digital speech signal of each segment and the first prediction parameters, a speech band residual signal which corresponds to the prediction error of this segment, PA1 a decimation filter for generating a baseband residual signal in response to the speech band residual signal, and PA1 an encoding-and-multiplexing circuit for encoding the first prediction parameters and the waveform of the baseband residual signal and for transmitting the resultant code signals in time-division-multiplex, PA1 a demultiplexing-and-decoding circuit for separating the transmitted code signals and for decoding the separated code signals into the first prediction parameters and the waveform of the baseband residual signal, PA1 an interpolating excitation generator for generating, in response to the baseband residual signal, an excitation signal corresponding to the speech band residual signal, and PA1 a first adaptive synthetis filter for forming a replica of the digital speech signal in response to the excitation signal and the first prediction parameters. PA1 a second LPC analyser for generating, in response to the speech band residual signal of the first adaptive inverse filter, second prediction parameters which characterize the fine structure of the short-term spectrum of this speech band residual signal, PA1 a second adaptive inverse filter for generating, in response to the speech band residual signal and the second prediction parameters, a modified speech band residual signal which is applied to the decimation filter; the encoding-and-multiplexing circuit in the transmitter and the demultiplexing-and-decoding circuit in the receiver are arranged for processing both the first and the second prediction parameters; and PA1 a second adaptive synthesis filter for forming, in response to the excitation signal of the interpolating excitation generator and the second prediction parameters, a modified excitation signal which is applied to the first adaptive synthesis filter.
and the receiver comprising:
Such a speech coder based on linear predictive coding (LPC) as a method of spectral analysis is known from the article by V. R. Viswanathan et al., "Design of a Robust Baseband LPC Coder for Speech Transmission over 9.6 Kbit/s Noisy Channels", IEEE Trans. Commun., Vol. COM-30, No. 4, April 1982, pages 663-673.
In this type of speech coder the digital speech signal is filtered with the aid of an inverse filter whose transfer function A(z) in z-transform notation is defined by ##EQU1## where P(z) is the transfer function of a predictor based on a segment-term spectral envelope of the speech signal, the filter coefficients A(i) with 1.ltoreq.i.ltoreq.p are the LPC-parameters computed for each speech signal segment of, for example, 20 ms and p is the LPC-order which usually has a value between 8 and 16. The speech band residual signal at the output of this inverse filter A(z) generally has a flat spectral envelope, which becomes the flatter according as the LPC-order p is higher. This speech band residual signal is used as an excitation signal for the (recursive) synthesis filter having the same filter coefficients a(i) and consequently a transfer function 1/A(z). As this synthesis filter 1/A(z) has a masking effect on the quantization noise of the speech band residual signal, it has been found that encoding the waveform of this residual signal with 3 bits per sample is adequate to obtain the same speech quality as in the case of a waveform encoding of the speech signal with the aid of a PCM coder standardized for telephony, in which the sampling rate is 8 kHz and an encoding with 8 bits per sample is used. The overall bit rate required for encoding the speech band residual signal and the LPC-parameters is however not significantly lower than in the case of a standardized PCM coder, as the speech band residual signal still has the same bandwidth as the speech band signal itself.
The speech coder described in the above-mentioned article utilizes the generally flat shape of the spectral envelope of the speech band residual signal to reduce the required overall bit rate. To that end the speech band residual signal is applied to applied to a digital low-pass filter, in which also a reduction of the sampling rate (decimation of down sampling) by a factor N of 2 to 8 is effected. In order to re-obtain a satisfactory excitation signal for the synthesis filter 1/A(z), the missing high-frequency portion of the spectrum must be recovered from the available low-frequency portion, the baseband, and in addition the sampling rate must be increased (interpolation or up sampling) to the original value. An excitation signal having the bandwidth of the actual speech signal is obtained in the prior art speech coder with the aid of a spectral folding method. With spectral folding the interpolation is merely the insertion of N-1 zero-value samples after every sample of the baseband residual signal, where N is the decimation factor. Consequently, the spectrum of the excitation signal consists of a low-frequency portion constituted by the preserved baseband and a high-frequency portion constituted by folding products of the baseband around the decimated sampling frequency and integral multiples thereof. This method has the advantage that a baseband residual signal having a flat spectral envelope results without fail in an excitation signal which also has a flat spectral envelope over the complete speech band. This property finds direct expression in the good speech quality thus obtained, the "hoarseness"--which is typical of the well-known non-linear distortion methods for obtaining an excitation signal having the bandwidth of the actual speech signal--is now absent.
So spectral folding is a very simple method which, however, has an inherent problem: it produces audible "metalic" background sounds which in the literature are known as "tonal noises" and which increase according as the decimation factor N is higher and according as the pitch of the speech is higher.
In view of this problem, a variant of the spectral folding method is applied in the excitation generator of the prior art speech coder, according to which the samples of the excitation signal are moreover subjected to a time-position perturbation after interpolation. More specifically, the time position of a nonzero-value sample (so an original sample of the baseband residual signal prior to interpolation) is randomly perturbed, and that by simply interchanging this nonzero sample with an adjacent zero-value sample if the magnitude of this nonzero sample remains below a predetermined threshold, the probability of perturbation increasing according as the magnitude of this nonzero sample is smaller. On the one hand the nonperturbed excitation signal is applied to a lowpass filter for selecting the baseband and on the other hand the perturbed excitation signal is applied to a highpass filter for selecting the high-frequency portion above the baseband, whereafter the two selected signals are added together to obtain the ultimate excitation signal. This variant of the spectral folding method essentially adds a signal-correlated noise to the spectrally folded baseband residual signal. From the perceptual point of view it was found that this additive noise has indeed a masking effect on the "tonal noises", but that it also introduces some "hoarseness". So using this variant in the prior art speech coder implicates a significant additional complication for the practical implementation, but does not result in a satisfactory solution of the "tonal noise" problem for spectral folding as a method of obtaining an excitation signal having the same bandwidth as the speech signal.