1. Field of the Invention
The present invention relates to a method for estimating phase, or delay, between signals of at least two noise-affected voice channels. More particularly, the present invention relates to method for estimating phase, or delay, between signals of at least two noise-affected voice channels based on maxima of a cross power density signal of the two voice channels.
2. Description of the Related Art
Such a method is used in automatic speech (voice) detection or recognition systems or for voice-actuated systems, for example, systems used in offices, motor vehicles, etc., for responding to a voice command.
Noise-affected speech can be better detected if the speech is recorded in two or more channels. For example, the human hearing system employs two channels, that is, two ears. Direction of a speaker is determined by psychoacoustic post-processing and background noise is cut out. In technical devices, two or more channels can be employed for recording a voice. These related recorded signals are then processed in a digital signal processing system.
A significant aspect of multi-channel processing is estimation of delay differences between the individual channels. If the difference in delay is known, the direction of the sound event (speaker) can be determined. The delay in the signals from the individual channels can be corrected accordingly and processed further. If, for example, uncorrected signals are combined into a sum signal, individual spectral components of the signal may be amplified, attenuated or erased by interference.
One method for automatically determining differences in delay between two microphones is disclosed in a publication by M. Schlang in ITG-Fachtagung 1988, Bad Nauheim, pages 69-73. The disclosed method operates in the time domain. However, the Schlang method cannot be employed with heavy noise.