Devices that require audio input, such as mobile radios, cellular telephones, and speech-recognition devices suffer from audio interference. In particular, speech recognition devices often require a clean audio signal (i.e., one that is substantially free from interference) to operate properly.
A straightforward way to cancel interference in a communication system is to estimate the interference in time, and subtract it from the signal received at the microphone. If a linear transfer function is used for the acoustic channel from the interference source to the microphone, the canceller could be implemented as follows: ##EQU1## Where: s(k) is the estimated desired signal at time k.
z(k) is the sampled contaminated signal at time k. PA1 y(k) is the interference estimated at time k; PA1 x(k) is the sampled untransduced interference audio at the transducer; PA1 a.sub.i is the i.sup.th coefficient of the numerator of the acoustic channel transfer function estimate; PA1 p is the number of coefficients in the numerator of the transfer function estimate, or the number of zeroes; PA1 b.sub.i is the ith coefficient of the denominator of the acoustic channel transfer function estimate; and PA1 q is the number of coefficients -1, of the denominator of the transfer function estimate, or the number of poles. PA1 S(f) is the estimated PSD of the desired speech; PA1 Z(f) is the estimated PSD of the contaminated audio; and PA1 Y(f) is the estimated PSD of the interfering audio.
The disadvantage of the above method is that it is computationally intensive. Thus, an interference reduction apparatus that is less computationally intensive would be desirable.
Speech recognition systems do not generally operate directly on the sampled speech. Instead they reduce the sampled speech to a parametric representation. The parameters of this representation are called its features. There are two common classes of recognition features: linear predictive coding (LPC) derived and filter bank derived. The extraction of these features and their use in speech recognition are described in L. R. Rabiner and S. E. Levinson, "Isolated and Connected Word Recognition-Theory and Selected Applications," IEEE Trans. on Comm., Vol. COM-29, No. 5, May 1981, at pp. 621-659. Both LPC and filter bank features are estimates of the power spectral density of the input signal. Since the interference is assumed to be added to the user's speech, the spectrum of the sum of the speech and interference is just the sum of the spectra of the individual signals. Therefore, knowledge of the interference of the speech as received at a microphone would allow it to be cancelled from the recognition features.
Typically, estimates of the power spectral density (PSD) of the interference are made based on a statistical model. One such approach is "spectral subtraction" described in Jae S. Lim, A. V. Oppenheim, "Enhancement and Bandwidth, Compression of Noisy Speech," Proc. of the IEEE, Vol. 67, No. 12, December 1979, pp. 1586-1604, and in M. S. Ahmed, "Comparisons of Noisy Speech Enhancement Algorithms in Terms of LPC Perturbation," IEEE Trans. on ASSP, Vol. 37, No. 1, January 1989 at pp. 121-125. This method calculates the PSD for the contaminated signal and cancels the estimated PSD from the contaminated signal PSD using the following equation: EQU S(f)=[.vertline.Z(f).vertline..sup..alpha. -IY(f) .vertline..sup..alpha. ].sup.1/.alpha. ( 2)
Where:
The subtraction of the interferer's PSD from the contaminated PSD implies that the interferer and the desired signal are uncorrelated. If this were not true, the PSD of their sum would not be the sum of the PSD of the individual signals.
The accuracy of this method suffers from two problems. First, the interferer and desired signals' correlation will not be zero for finite samples, which violates the uncorrelated assumption. Second, only estimates of the power spectrum are available, so that negative PSDs will sometimes result.