The invention concerns a method for reducing voice signal interference.
Such a method can have an advantageous application for eliminating interference in voice signals for voice communication, in particular hands-off communication systems, e.g. in motor vehicles, voice detection systems and the like.
A frequently used method for reducing the noise portion in voice signals with interference is the so-called spectral subtraction. This method has the advantage of a simple implementation without much expenditure and a clear reduction in noise.
One uncomfortable side effect of the noise reduction by means of spectral subtraction is the occurrence of tonal noise portions that can be heard briefly and which are referred to as xe2x80x9cmusical tonesxe2x80x9d or xe2x80x9cmusical noisexe2x80x9d because of the auditory impression.
Measures for suppressing xe2x80x9cmusical tonesxe2x80x9d through spetral subtraction include the overestimation of the interference output, that is to say the overcompensation of the interference, having the disadvantage of increased voice distortion or allowing for a relatively high noise base with the disadvantage of only a slight noise reduction (e.g. xe2x80x9cEnhancement of Speech Corrupted by Acoustic Noisexe2x80x9d by Berouti, M.; Schwartz, R.; Makhoul, J.; in Proceedings on ICASSP, pp. 208-211, 1979). Methods for a linear or non-linear smoothing and thus suppression of the xe2x80x9cmusical tonesxe2x80x9d are described, for example, in xe2x80x9cSuppression of Acoustic Noise in Speech Using Spectral Subtractionxe2x80x9d by S. F. Boll in IEEE Vol. ASSP-27, No. 2, pp 113-120. An effective, non-linear smoothing method with median filtering is disclosed in the DE 44 05 723 A1.
Also known are methods, which in addition to the spectral subtraction take into account the psychoacoustic perception (e.g. T. Petersen and S. Boll, xe2x80x9cAcoustic Noise Suppression in a Perceptual Modelxe2x80x9d in Proc. On ICASSP, pp. 1086-1088, 1981). The signals are transformed into the psychoacoustic loudness range in order to carry out a more aurally correct processing. In xe2x80x9cSpeech Enhancement Using Psychoacoustic Criteria,xe2x80x9d Proc. On ICASSP, pp. II359-II362, 1993, and G. Virag in xe2x80x9cSpeech Enhancement Based on Masking Properties of the Auditory System,xe2x80x9d Proc. On ICASSP, pp. 796-799, 1995, D. Tsoukalis, P. Paraskevas and M. Mourjopoulos use the calculated covering curve to find out which spectral lines are masked by the useful signal and thus do not have to be damped. This improves the quality of the voice signal. However, the interfering xe2x80x9cmusical tonesxe2x80x9d are not reduced in this way.
It is an object of the present invention to provide an improved a method for reducing interference in voice signals.
The invention provides a method for reducing interferences in a voice signal. The method includes:
applying a noise reduction method to the voice signal;
taking into account spectral psychoacoustic masking;
determining a first spectral masking curve for an input signal of the noise reduction method;
determining a second spectral masking curve for an output signal of the noise reduction method; and
selectively damping newly audible portions of the output signal which are not opposed by spectrally corresponding portions of the input signal that exceed the first spectral masking curve.
The invention is based on the fact that the signal portions, which cannot be heard separately until the noise reduction, are detected as interferences and are subsequently reduced or removed through a selective damping. The exceeding of a masking curve (masking threshold) is in this case used as criterion for audibility, in a manner known per se.
The determination of masking curves is known, e.g. from sections of the initially mentioned state of the technology and more specifically also from Tone Engineering, Chapter 2, Psychoacoustics and Noise Analysis (pp. 10-33), Expert Publishing, 1994. The masking curves can be determined on the basis of the actual voice signals as well as on the basis of a noise signal during speech pauses, wherein various psychoacoustic effects can also be taken into account. The masking curves, which are also referred to as concealing curves, masking thresholds, monitoring thresholds and the like in the relevant literature, can be viewed as frequency-dependent level threshold for the audibility of a narrow-band tone.
In addition to using them for interference elimination, such masking curves are also used, for example, for data reduction during the coding of audio signals. Details concerning steps that can be taken for determining a masking curve follow, for example, from xe2x80x9cTransform Coding of Audio Signals Using Perceptual Noise Criteriaxe2x80x9d, by J. Johnston in IEEE Journal on Select Areas Commun., Volume 6, pp. 314-323, February 1988, in addition to the previously mentioned publications. Basic steps of a typical method for determining a masking curve from the short-term spectrum of a voice signal with interference are, in particular:
A critical band analysis, where a signal spectrum is divided into so-called critical bands and where a critical band spectrum B(n) (also bark spectrum with n as band index) is obtained from the performance spectrum P(i) through summing up within the critical bands;
Convolution of the bark spectrum with a spreading function for taking into account the masking effects over several critical bands, which makes it possible to obtain a modified bark spectrum;
Possible, additional consideration of the varied masking properties of noise-type and tone-type portions by an offset factor that is determined through the composition of the signal;
A bark-related masking curve T(n) is obtained, following re-scaling in proportion to the respective energy in the critical bands and, if necessary, raising of the lower values to the values of the auditory threshold in the rest position, and a frequency-specific masking curve V(i) with V(i)=T(n) follows from this for all frequencies i within the respective, critical band n.
With the determined masking curve V(i), the spectral portions of the signal can be divided into audible (P(i) greater than V(i)) and masked (P(i)xe2x89xa6V(i)) portions by comparing the performance spectrum P(i) to the masking curve V(i).