The present invention relates to a method and an apparatus for processing a sound signal.
A voice recognition system is disclosed in A. Hauenstein, xe2x80x9cOptimierung von Algorithmen und Entwurf eines Prozessors fxc3xcr die automatische Spracherkennungxe2x80x9d [Optimization of algorithms and design of a processor for automatic voice recognition], Chair of Integrated Circuits, Technical University of Munich, Dissertation, Chapter 2, Jul. 19, 1993, pp. 13-26, which also contains a basic introduction to components of the voice recognition system and important techniques which are customary in the context of voice recognition.
A wavelet transformation is disclosed in S. G. Mallat, xe2x80x9cA Theory for Multiresolution Signal Decomposition: The Wavelet Representationxe2x80x9d, IEEE Trans. on Pattern Analysis and Machine Intelligencexe2x80x9d, Vol. 11, No. 7, July 1989, pp. 674-693. A wavelet transformation is preferably effected in a number of transformation stages, where a transformation stage subdivides a pattern into a high-pass filter component and a low-pass filter component. The respective high-pass and low-pass filter component preferably has a reduced resolution compared with the pattern (technical term: subsampling, i.e. reduced sampling rate, consequently reduced resolution). The pattern can be reconstructed from the high-pass and low-pass filter components. This is ensured in particular by the specific form of the transformation filters used during the transformation. The wavelet transformation can be effected one-dimensionally, two-dimensionally or multi-dimensionally.
A sound signal comprises a useful signal and an interference signal, the intensity of the interference signal depending on the surroundings. For further processing of the sound signal, it is an essential precondition that the useful signal be isolated from the interference signal.
Methods are known which suppress different regions of a frequency spectrum of the sound signal to a greater or lesser extent. In this case, it is disadvantageous that a dynamic development of the interference signal is not taken into account.
It is an object of the present invention to provide a method and an apparatus which ensure processing of a sound signal in such a way that the disadvantage described above is avoided.
This object is achieved in accordance with the present invention in a method for processing a sound signal, said method comprising the steps of: transforming said sound signal into the frequency domain; determining an envelope of the transformed sound signal over a time period for at least one prescribed frequency; subdividing the envelope into a first number of segments each determined by a prescribed duration; determining a maximum of the envelope for each segment of the first number of segments; determining a smallest maximum of said determined maximums for a second number of segments of said first number of segments; weighting the smallest maximum by a factor; and processing the sound signal by subtracting the weighted smallest maximum from the sound signal.
In an embodiment, the method further comprises the steps of: determining a minimum for a third number of segments of the first number of segments; and combining the smallest maximum with the minimum, and wherein the sound signal is processed by the subtracting the combined smallest maximum and minimum from the sound signal.
With a transformation of a temporal signal into a frequency domain, e.g. by means of fast Fourier transformation (FFT), a region of the temporal signal which comprises a prescribed number of samples is transformed into the frequency domain. This operation is effected for different instants, with the result that, as time progresses in the frequency domain, the individual frequencies produce different values, dependent on the respective transformed region of the temporal signal. In this way, it is possible to represent the profile of a frequency over the time.
In addition to the FFT, it is also possible to use a wavelet transformation or any other transformation for mapping the time domain into the frequency domain.
A method for processing a sound signal is specified in which the sound signal is transformed into a frequency domain. An envelope of the sound signal that has been transformed into the frequency domain over the time is determined for at least one prescribed frequency of the sound signal. The envelope is subdivided into a quantity of segments each determined by a prescribed duration. A maximum of the envelope is determined for each segment of the quantity of segments. The smallest maximum is determined for a prescribed number of the segments of the quantity of segments. The sound signal is processed by the smallest maximum, weighted by a factor, being subtracted from the sound signal.
The smallest maximum is thus advantageously specified, over a predetermined duration for the respective frequency whose envelope is determined over the time, the smallest maximum preferably encompassing the interference signal in a sound signal comprising a useful signal and an interference signal. This is manifested in particular when the sound signal is naturally spoken speech. In this case, the speech comprises a number of words which comprise, even with fluent articulation, points exhibiting spectral minima (in particular gaps between the individual words). In such points exhibiting spectral minima, the useful signal is virtually absent, whereas the interference signal is dominant.
Another advantage consists in the fact that the smallest maximum is determined for the number of the segments. In this case, the number of segments comprise a dynamic profile of the interference signal over the time. Thus, the interference signal may be an engine noise in a motor vehicle, which motor vehicle accelerates continuously over a period of time. The interference signal in the motor vehicle thus increases over the time (during the acceleration). Since the smallest maximum is determined in each case for the number of the segments, the smallest maximum is determined (anew) over the time for each number of the segments, with the result that the dynamic development of the interference signal can be concomitantly taken into account.
In a embodiment, a minimum is determined for a further number of the segments of the quantity of segments, and the sound signal is processed by the smallest maximum, combined with the minimum, being subtracted from the sound signal.
Taking account of the minimum which is determined for a further number of the segments proves to be extremely advantageous for the adaptation of the interference signal which is to be subtracted from the sound signal, in order to obtain the useful signal. If in an embodiment precisely no useful signal is present, the minimum identifies the interference signal and is therefore subtracted from the sound signal.
In an embodiment the minimum and the smallest maximum are combined in accordance with the following relationship:
a+bmax/min,
where
a designates a first prescribed coefficient,
b designates a second prescribed coefficient,
max designates the smallest, and
min designates the minimum.
In this case, the coefficients should be prescribed in such a way that the interference signal is reduced in a favorable manner for the application.
In an embodiment, in each case after the number or the further number of segments has elapsed, updating is carried out in such a way that an updated interference signal is subtracted from the sound signal.
In an embodiment, the sound signal is a voice signal, preferably naturally spoken speech.
In an embodiment, the processed sound signal to be used for voice recognition purposes. A clear useful signal, as far as possible with no interference signal components, is an advantageous precondition precisely for a voice recognition system. Thus, the voice recognition system recognizes the spoken speech all the better, the clearer the useful signal is. Furthermore, the useful signal can also be output.
The object of the invention is also achieved in an apparatus for processing a sound signal comprising: a processor unit for: transforming said sound signal into the frequency domain; determining an envelope of the transformed sound signal over a time period for at least one prescribed frequency; subdividing the envelope into a first number of segments each determined by a prescribed duration; determining a maximum of the envelope for each segment of the first number of segments; determining a smallest maximum of said determined maximums for a second number of segments of said first number of segments; weighting the smallest maximum by a factor; and processing the sound signal by subtracting the weighted smallest maximum from the sound signal.
In an embodiment, the processor unit is further for: determining a minimum for a third number of segments of the first number of segments; and combining the smallest maximum with the minimum, and wherein the sound signal is processed by the subtracting the combined smallest maximum and minimum from the sound signal.
In an embodiment, an apparatus for processing a sound signal is specified, which has a processor unit which can be set up in such a way that the sound signal can be transformed into a frequency domain. An envelope of the sound signal that has been transformed into the frequency domain over the time can be determined for at least one prescribed frequency. The envelope can be subdivided into a quantity of segments each determined by a prescribed duration. A maximum of the envelope is determined for each segment of the quantity of segments. The smallest maximum is determined for a number of the segments of the quantity of segments. The sound signal is processed by the smallest maximum, weighted by a factor, being subtracted from the sound signal.
In an embodiment, processor unit is set up in such a way that a minimum is determined for a further number of the segments of the quantity of segments, and that the sound signal is processed by the smallest maximum, combined with the minimum, being subtracted from the sound signal.
The apparatus is particularly suitable for carrying out the method according to the invention or ones of its embodiments explained above.
These and other features of the invention(s) will become clearer with reference to the following detailed description of the presently preferred embodiments and accompanied drawings.