The invention relates to the field of signal processing, and in particular to the field of adaptive reduction of noise signals in a speech processing system.
In speech-processing systems (e.g., systems for speech recognition, speech detection, or speech compression) interference such as noise and background noises not belonging to the speech decrease the quality of the speech processing. For example, the quality of the speech processing is decreased in terms of the recognition or compression of the speech components or speech signal components contained in an input signal. The goal is to eliminate these interfering background signals with the smallest computational cost possible.
EP 1080465 and U.S. Pat. No. 6,820,053 employ a complex filtering technique using spectral subtraction to reduce noise signals and background signals wherein a spectrum of an audio signal is calculated by Fourier transformation and, for example, a slowly rising component is subtracted. An inverse transformation back to the time domain is then used to obtain a noise-reduced output signal. However, the computational cost in this technique is relatively high. In addition, the memory requirement is also relatively high. Furthermore, the parameters used during the spectral subtraction can be adapted only very poorly to other sampling rates.
Other techniques exist for reducing noise signals and background signals, such as center clipping in which an autocorrelation of the signal is generated and utilized as information about the noise content of the input signal. U.S. Pat. Nos. 5,583,968 and 6,820,053 disclose neural networks that must be laboriously trained. U.S. Pat. No. 5,500,903 utilizes multiple microphones to separate noise from speech signals. As a minimum, however, an estimate of the noise amplitudes is made.
A known approach is the use of an finite impulse response (FIR) filter that is trained to predict as well as possible from the previous n values the input signal composed of, for example, speech and noise, this being achieved using linear predictive coding (LPC). The output values of the filter are these predicted values. The values of the coefficients c(i) of this filter on average rise for noise signals more slowly than for speech signals, the coefficients being computed by the equation:ci(t+1)=ci(t)+μ·e·s(t−i)  (1)where μ<<1, for example, μ=0.01 is a learning rate, s(t) is an audio input signal at time t, e=s(t)−sv(t) is an error resulting from a difference of all the individual prediction errors from the audio input signal, sv(t) is the output signal resulting from the sum of the terms ci(t−1)·s(t−i), that is, of the individual prediction errors over all i of 1 through N, N is the number of coefficients, and ci(t) is an individual coefficient having a parameter i at time t.
There is a need for a system of reducing noise signals and background signals in a speech-processing system.