The invention relates to a speech detection device having two switch-off criterions.
Such a speech detection device, such a speech detection method and such a computer program product are known as part of a speech recognition device that has been marketed by the applicants since 1998 as a computer program referred to as xe2x80x9cFree Speech 98(copyright)xe2x80x9d. When a computer runs the computer program xe2x80x9cFreeSpeech 98xe2x80x9d and a user dictates a text into a microphone connected to the computer, the text recognized by the speech recognition means of the known speech recognition device is displayed on a monitor connected to the computer. During the dictation the user speaks the text sometimes fluently and sometimes with short pauses into the microphone. Sometimes the user holds the microphone too far away from his mouth, so that the signal-to-noise ratio of the electric microphone signal produced by the microphone is poor. During so-called speech time slots the microphone signal therefore contains a speech signal that corresponds to the user""s spoken text and during so-called pause time slots no speech signal or a speech signal with a poor signal-to-noise ratio.
The speech detection device of the known speech recognition device can be supplied with the microphone signal delivered by the microphone as a received signal or as received data representing the received signal, respectively. The speech detection device detects the beginning and the end of the speech signal in the received signal and determines corresponding speech time slots. The speech detection device applies speech detection information to the speech recognition means during speech time slots, which speech recognition means process the microphone signal delivered by the microphone only during speech time slots.
For detecting the speech signal in the received signal, the known speech detection device includes a switch-on threshold detector and a switch-off threshold detector, which compare the energy content of the input signal to a first and a second energy threshold, the first energy threshold being higher than the second energy threshold. When the energy content of the received signal exceeds the first energy threshold, the switch-on threshold detector produces first detection information, and if the energy content of the received signal falls short of the second energy threshold, the switch-off threshold detector produces second detection information.
To determine the speech time slot, the speech detection device includes information processing means for receiving and processing the detection information. As a switch-on criterion of a speech time slot is determined the occurrence of the first detection information, after which the beginning of a speech time slot is determined by the information processing means 240 ms before the switch-on criterion is satisfied. The uninterrupted occurrence of the second detection information during a first switch-off period is determined as a switch-off criterion of the speech time slot, after which the end of the speech time slot is determined by the information processing means when the switch-off criterion is satisfied.
The known speech detection device, the known speech detection method and the known computer program product have the disadvantage that the switch-off criterion of the received signal is not satisfied when the energy content of the received signal varies around the second energy threshold. Such a received signal is applied to the speech recognition device, for example, when a user interrupts the dictation for a telephone conversation and puts the microphone on the table. The words spoken by the user or by another person in the room during the telephone conversation at a large distance from the microphone are applied to the microphone as microphone signals which occasionally contain a speech signal having a poor signal-to-noise ratio. This received signal with the speech signal having the poor signal-to-noise ratio is erroneously detected by the speech detection device as a speech signal suitable for the speech recognition, because the speech time slot is not terminated by the speech detection device. In this manner, a speech signal that is not at all provided for being recognized is processed by the speech recognition means with a recognition rate of the speech recognition device that is poor because of the poor signal-to-noise ratio and most probably a wrong text is recognized.
It is an object of the invention to eliminate the problems defined above and provide a speech detection device, a speech detection method and a computer program product of the type defined in the opening paragraph, in which a second switch-off criterion is provided for reliably terminating the speech time slots.
This achieves that in the information processing means is determined as a second switch-off criterion for terminating the speech time slots the uninterrupted lacking of the first detection information during a second switch-off period, after which the end of the speech time slots is also determined by the information processing means depending on whether the second switch-off criterion is satisfied. In addition to or in lieu of this second switch-off criterion, the information processing means can also verify a third switch-off criterion according to which there is tested whether first detection information was not received during a third switch-off period since the second detection information has been received for the first time after the first detection information had not been received.
Terminating the speech time slots in dependence on the second and/or third switch-off criterion offers the advantage that in that case too only one speech signal having a good signal-to-noise ratio is reliably used for speech recognition by a speech recognition device if, for example, a working condition as discussed above occurs and the received signal varies around the threshold.
By the measures as claimed in claim 2 is obtained a highly reliable second switch-off criterion and by the measures as claimed in claim 3 a highly reliable switch-on criterion for speech time slots. The measures as claimed in claim 4 adapt the energy threshold of the switch-on threshold detector and the switch-off threshold detector to the energy content of the noise signal in the received signal, so that the detection of a speech signal having a good signal-to-noise ratio is improved.
The inventions will be described in the following with reference to two examples of embodiment shown in the Figures, to which, however, the invention is not restricted.