1. Field of the Invention
The present invention generally relates to digital voice transmission systems and, more particularly, to a new technique for increasing the signal-to-noise ratio (SNR) in a linear predictive multi-pulse excited speech coder.
2. Description of the Prior Art
Code excited linear prediction (CELP) and multi-pulse linear predictive coding (MPLPC) are two of the most promising techniques for low rate speech coding. While CELP holds the most promise for high quality, its computational requirements can be too great for some systems. MPLPC can be implemented with much less complexity, but it is generally considered to provide lower quality than CELP.
Multi-pulse coding is believed to have been first described by B. S. Atal and J. R. Remde in "A New Model of LPC Excitation for Producing Natural Sounding Speech at Low Bit Rates", Proc. of 1982 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, May 1982, pp. 614-617. It was described to improve on the rather synthetic quality of the speech produced by the standard U.S. Department of Defense LPC-10 vocoder. The basic method is to employ the linear predictive coding (LPC) speech synthesis filter of the standard vocoder, but to use multiple pulses per pitch period for exciting the filter, instead of the single pulse used in the Department of Defense standard system. The basic multi-pulse technique is illustrated in FIG. 1.
Absent in the Atal et al. paper is the all-important solution technique for the optimal locations and amplitudes of the pulses used to excite the synthesis filter. Since the publication of the Atal et al. paper, a large effort has been expended in devising a low-complexity solution for the amplitudes and positions. A truly optimal technique requires simultaneous solution for the pulse amplitudes and positions; however, this would result in a non-linear set of equations whose solution would be quite difficult. Most of the published techniques find the pulse positions sequentially, and then as each new position is found, they solve simultaneously for a new set of amplitudes for the new pulse and all previous pulses. The solution for the amplitudes is a simple set of linear equations that is easily solved simultaneously. This method is nearly optimal and gives excellent results. The technique is described in more detail by T. Araseki et al. in "Multi-pulse Excited Speech Coder Based on Maximum Crosscorrelation Search Algorithm", Proc. of IEEE GLOBECOM 83, Nov. 1983, pp 794-798.
To achieve low transmission rates, a multi-pulse coder must be used with longer frame lengths than those optimal for good voice quality. In addition, a pitch predictor is usually added, since it provides a large increase in quality for a small increase in rate. For proper operation, the pitch predictor gain and delay lag must be computed from the cross-correlation between the data in the pitch synthesis filter buffer (i.e., output data from the previous frame) and the present frame of input data to be coded. The term "frame" is used herein to refer to a contiguous time sequence of analog-to-digital samplings of a speech waveform. When a pitch predictor of this type is used in a coding system with frame lengths longer than the minimum expected pitch period, it is no longer possible to estimate the pitch lag and gain optimally because the data required for the estimation process is not yet available. In other words, the dilemma is that the output signal of the pitch synthesis filter is required to estimate the filter parameters, but no output signal can be generated before the parameters are known.
When a pitch predictor is integrated into a multi-pulse coder, there could be significant cross-correlation between the excitation provided by the predictor and the excitation provided by the pulses. In a conventional implementation, however, the predictor and pulse information are solved for sequentially and independently, precluding use of any knowledge of cross-correlation. Yet, if the cross-correlation is not taken into account, the estimation of the pulse amplitudes and predictor gain will be biased, resulting in decreased performance.
As stated above, a pitch predictor is frequently added to the multi-pulse coder to further improve the SNR and speech quality. The pitch predictor comprises a recursive infinite impulse response (IIR) digital filter with a single tap placed at a lag equal to the number of samples in the pitch period: EQU y(i)=.beta.y(i-P)+e(i), (1)
where e(i) is the pulse excitation sequence, y(i) is the pitch predictor output sequence, .beta. is the pitch predictor tap gain, and P is the pitch lag. To solve for .beta. and P, the lag (P) is first estimated by the location of the peak cross-correlation between the filtered samples in the pitch buffer and the input sequence. The gain (.beta.) is then given by the normalized cross-correlation ##EQU1## here x'(i) is the weighted input sequence, yp(i) contains the filtered pitch buffer samples (i.e., the previous output sequence from Equation (1)), and N is the frame length. By examining Equations (1) and (2), the cause of the previously-mentioned dilemma becomes apparent; that is, if the pitch lag P is shorter than the frame length N, the sums in Equation (2) require filtered values yp(i-P) generated from the pitch buffer that have not yet been synthesized (i.e., when i-P is equal to or greater than 0). A preferred method for finding .beta. is to simply extend the pitch buffer by copying previous values at a distance of P samples: ##EQU2## Equation (3) assumes that 2P is greater than N. It is a simple matter to extend the pitch buffer for shorter pitch lags/longer frame lengths.
The value for given in Equation (3) is only an approximation if the standard pitch synthesis filter of Equation (1) is used. The estimated value for .beta. will be correct only if the sequence being synthesized is perfectly periodic; i.e., .beta.=1.0. While this method has been used with reasonable success in systems where the frame length is relatively short (i.e., when P is usually greater than N, but only occasionally less than N), it will perform very poorly when N is increased such that the value taken on by P is frequently less than N. Another problem with using Equation (3) to estimate values for Equation (1) lies in the fact that these two equations are incompatible since the system will not perform properly when used with a simultaneous solution.
In any given speech coding algorithm, it is desirable to attain the maximum possible SNR in order to achieve the best speech quality. In general, to increase the SNR for a given algorithm, additional information must be transmitted to the receiver, resulting in a higher transmission rate. Thus, a simple modification to an existing algorithm that increases the SNR without increasing the transmission rate is a highly desirable result.