1. Field of the Invention
One or more embodiments of the present invention relate to a speech recognition technique, and more particularly, to a speech detection method, medium, and system.
2. Description of the Related Art
Speech recognition techniques implement computers to analyze, identify, and recognize human speech. In such speech recognition techniques, spoken speech signals are converted into electrical signals, and pronunciation is recognized by extracting frequency characteristics of the speech signals by using human speech having a specific frequency caused by changes in mouth shape and tongue location depending on phonation.
Recently, such aforementioned speech recognition techniques have been applied to various fields such as telephone dialing, toy control, language learning, and household appliance control, for example.
FIG. 1 illustrates a conventional speech recognition device based on phoneme recognition.
Referring to FIG. 1, the speech recognition device includes an A/D converter 100, spectrum analyzer 110, a phoneme detector 120, and a lexical analyzer 130.
The A/D converter 100 converts an analog speech signal transmitted through a microphone into a digital signal input to the spectrum analyzer 110. Frequency spectrum characteristics of the digital signal are then analyzed. Only acoustic features are extracted and supplied to the phoneme detector 120, and the phoneme detector 120 outputs a predetermined sequence of the phonemes obtained from the input speech signal. Thereafter, the lexical analyzer 130 receives the phoneme sequence and finally recognizes words or sentences.
However, since the speech recognition device analyzes the frequency characteristics of the input speech signal and compares the frequency characteristics with an acoustic model stored in the phoneme detector 120 in order to detect the phoneme, the effects of noises accompanying the speech signal are not taken into consideration. Accordingly, performance of the speech recognition device is deteriorated due to such noise, since the noise can typically be improperly recognized as phonemes or improperly influence the phoneme recognition.
In this regard, known techniques for improving the performance of speech recognition devices have included noise models considering the inclusion of noise in the input speech, such a technique is discussed in US Patent Publication No. 2004/0158465, titled “SPEECH PROCESSING APPARATUS AND METHOD” which discusses a noise masking technique for removing noise from frames of input speech signals by using a filter.
However, since such existing techniques including noise masking techniques are optimized for stationary noises, whose characteristics do not temporally change substantially, e.g., noise generated from cars or turbines of an airplane having frequency characteristics that do not temporally change substantially, while a burst noise generated in short time bursts, e.g., a small breathing sound, a mechanical frictional sound, and a mouth sound generated in the front or back end of the input speech signal, represent noises that are very difficult to distinguish between speech and non-speech sounds.
In addition, in speech recognition techniques based on such conventional phoneme recognition devices, frequently, a non-speech signal including such a burst noise generated in the front or back end of the aforementioned input phoneme is incorrectly recognized as an actual phoneme, which results in deterioration of the performance of the speech recognition device.